Unlocking Market Basket Analysis: How Snowflake Apps Turn Transaction Data into Revenue Growth
Imagine uncovering the hidden patterns behind every shopping cart to help predict what your customers will buy next. By turning mountains of transaction data into clear, actionable insights, businesses can tailor their offerings and boost revenue without guesswork. At Brooklyn Data, we built a Snowflake app to automate this valuable data discovery work. How? By leveraging your data in Snowflake, it performs market basket (affinity) analysis on your dataset and identifies high-value product combinations that you can promote.
What is Market Basket Analysis?
Market Basket Analysis (MBA) is a data-mining technique that scans large collections of transactions, with each “basket” (also-known-as: a shopping cart) representing all items that a customer purchased in a single visit. MBA identifies items that tend to appear together more often than chance. For example, MBA can identify that customers who buy a webcam often also pick up a ring light, or those who pick up an ergonomic mouse often also buy a wrist rest.
In practice, this means looking through thousands or even millions of sales transactions to discover patterns. The result is a set of straightforward association rules expressed in the form “IF X product THEN Y product,” accompanied by a series of metrics: support, confidence, lift, leverage, and conviction. These metrics help surface the most meaningful product relationships from your noisy data.
Armed with these insights, you can:
- Tailor your cross-selling approach. Trigger a targeted pop-up recommending a ring light whenever someone adds a webcam to their cart, or bundle complementary products under a single discount. Show a targeted pop-up recommending a ring light when someone adds a webcam to their cart, or bundling complementary products under a single discount.
- Optimize your merchandising and layout. Place relevant items near each other to boost discoverability. Or let data drive decisions about shelf placement, category groupings, or “people who bought this also bought that” widgets.
- Inform broader promotional and pricing strategies. Identify a low-margin item that frequently co-occurs with a high-margin item, then feature that low-margin item as a loss leader to increase sales of the high-margin item.
- Build a more satisfying shopping experience. Reduce the number of irrelevant offers you show the customer, and feature combinations that genuinely matter to the customer.
By focusing on statistically significant pairings rather than guesswork, you can make promotional efforts more effective. When shoppers see offers that align with what they’re already looking for, they feel understood, which can lead to higher satisfaction and increased loyalty over time!
MBA Metrics
At its core, Market Basket Analysis is counting how often items appear together in customer’s baskets and translating those counts into basic probabilities. This is accomplished by applying simple arithmetic to transaction data, counting occurrences, and dividing to find ratios.
Let’s say you have the following transaction dataset, where each line represents a basket of items a customer purchased:
baskets = [
{"Webcam", "Monitor", "Desk Lamp"},
{"Webcam", "Monitor"},
{"Webcam", "Monitor", "Wireless Keyboard"},
{"Webcam"},
{"Standing Desk", "Ergonomic Chair"},
{"Desk Lamp", "Wireless Keyboard"},
{"Standing Desk", "Desk Lamp", "Wireless Keyboard"},
{"Ergonomic Chair", "Standing Desk"},
{"Desk Lamp"}
]
We’re interested in the relationship between buying Webcam and buying a Monitor. In statistical jargon, Webcam is the antecedent; it’s the “if this” part of the relationship, the product-of-interest. The Monitor is the consequent, which is the “then this” part, or the related product. This is our “rule”: webcam → monitor How do we determine if these products have a valuable relationship? MBA metrics can help us!
Side note: The “if…then” relationship isn’t temporal. Items in the basket are unordered. Instead, think of it more as, “does buying the antecedent influence buying the consequent?”
Support
Our first metric to calculate is support. Support is a measure of how frequently the antecedent and consequent appear in the same basket. Let’s look at Webcam and Monitor:
antecedent = {"Webcam"}
consequent = {"Monitor"}
total_baskets = len(baskets)
count_A = 0 # baskets containing A
count_B = 0 # baskets containing B
count_A_and_B = 0 # baskets containing both A and B
for basket in baskets:
has_A = antecedent.issubset(basket)
has_B = consequent.issubset(basket)
if has_A:
count_A += 1
if has_B:
count_B += 1
if has_A and has_B:
count_A_and_B += 1
support_A = count_A / total_baskets
support_B = count_B / total_baskets
support_AB = count_A_and_B / total_baskets
support = support_AB
First we count up the number of baskets that have a Webcam. Then we count the number of baskets that have a Monitor. Finally, we tally up the number of transactions that have both a Webcam and a Monitor. Dividing the last result by the number of baskets gives us a value of 33%, which is the percentage of baskets in all our transaction data that contain both items. That’s the support metric for this rule.
Is this rule useful, interesting, or valuable? Maybe! But support alone can fool you when one of the items is already in almost every basket. Suppose you look instead at desk lamp → webcam. Desk Lamps show up in roughly half of our baskets, so the support for this rule will automatically hover around whatever percentage of baskets contain a Webcam even if there’s no real affinity. Take the extreme case: if every customer received a complimentary Desk Lamp with any purchase, the support for any rule with Desk Lamp as the consequent would equal the antecedent’s popularity. That isn’t actionable insight; it’s simply the lamp’s ubiquity showing through. This illustrates why analysts rarely rely on support in isolation because popular items can swamp the signal.
Confidence
We can expand on support by looking at Confidence. Confidence measures the conditional probability of the consequent given the antecedent, which is a rough measure of the dependency between the items:
confidence = support_AB / support_A
We can interpret this as “Given Webcam is in the basket, what are the chances of also buying a Monitor?” So, although Monitors and Webcams appear together in 33% of all baskets, once you know a Webcam is present, the chance of a Monitor joining it jumps to 75%. That’s the confidence of our rule Webcam → Monitor.
Lift
Lift builds on support and confidence to derive a metric often referred to as “expected confidence.” Lift is a ratio between confidence and the baseline probability of encountering the consequent:
lift = confidence / support_B
For Webcam and Monitor we get a Lift of 2.25. That tells us the two items appear together more than twice as often as random chance would predict. In other words, Webcam buyers are 2.25 times more likely to add a Monitor than the typical shopper. This signals a strong, positive affinity between the products.
And this brings us to an important point about MBA: no single metric is “good” or “bad” in isolation. A support of 33% or a confidence of 75% might be valuable in one context and meaningless in another. A lift below 1 flags a weaker-than-expected association; a lift well above 1 highlights a potentially powerful pairing. But you still need to weigh it alongside support, confidence, and, most importantly, your business goals before launching a promotion or bundle.
Leverage
The next metric is Leverage, which is loosely defined as how many extra (or fewer) baskets you get beyond what chance alone would produce. The simplest way to think about Leverage is that it measures the difference between a real connection and a simple coincidence. Another way to think about it is “synergy”: it quantifies how much of the product co-occurrence is due to a genuine relationship, rather than just their individual popularity. Leverage is calculated like so:
leverage = support_AB - (support_A * support_B)
Plugging the numbers in for Webcam and Monitor we get 0.185 or 18.5%. Think of leverage as the extra or incremental share of orders that contain the two items because they tend to travel together, not just because each one is popular on its own.
- If the products had no relationship, we’d expect about 14.8% of baskets (0.444 × 0.333) to contain both.
- We actually see them together in 33% of baskets.
- The gap of 33.33% − 14.8% = 18.5% is the leverage.
Conviction
Finally we have Conviction, which focuses on the miss rate, which is how often the antecedent shows up without the consequent, and compares it with the miss rate you would expect if the items were independent. In plain language it answers: “By what factor does this rule reduce pointless recommendations?”
expected_misses = support_A * (1 - support_B) # if Webcam buyers behaved like everyone
observed_misses = support_A - support_AB # actual Webcam-with-no-Monitor cases
conviction = expected_misses / observed_misses
For Webcam → Monitor the calculation produces ≈ 2.7:
- If Webcams and Monitors were completely unrelated you would expect about 2.7 “Webcam-only” baskets in every nine orders.
- You actually see only one.
The rule therefore cuts irrelevant Monitor prompts to roughly one third of what random targeting would deliver. Any conviction above 1 is useful; the higher the value, the more confidently you can promote the add-on or bundle without spamming customers.
Applying the Metrics
Our market basket analysis of the relationship between Webcam and Monitor can be summarized in this table:
support 0.333333
confidence 0.75
lift 2.25
leverage 0.185185
conviction 2.666667
But remember no metric alone should determine your strategy. Instead, you can use combinations of these metrics, depending on the business context, as demonstrated below:
Business Need | Best Metric(s) | MBA Answer |
---|---|---|
"What is this rule's impact on the total volume of co-purchases?" | Leverage | Its impact is an 18.5% increase in the share of total transactions containing both items, a significant volume boost. |
“Does A reliably predict B and not just mirror B’s popularity?” | Conviction | Yes. The rule Webcam -> Monitor is incorrect 2.7 times less often than we'd expect if the two items were independent. It's a reliable predictor. |
“Is the relationship symmetric? I don’t care who predicts whom.” | Leverage, Lift | Yes. The Lift (2.25) and Leverage (18.5%) are the same for both Webcam -> Monitor and Monitor -> Webcam, showing a strong, mutual affinity. |
“Is the relationship directional? I care about recommending B when I see A.” | Confidence, Conviction | Yes. 75% of the time someone buys a Webcam, they also buy a Monitor. This gives us high confidence to make a one-way recommendation. |
“Which rules move the most product, not just ratios?” | Support x Lift, Support x Leverage | This rule has high support (33%) and high lift (2.25). This combination suggests it's not only a strong pairing but also a frequent one, making it commercially valuable. |
See how even a handful of orders can spotlight clear cross-sell and bundling opportunities you’d never spot by intuition alone? Now imagine running those same calculations across your entire transaction history in seconds, turning raw data into prioritized, high-impact marketing actions.
How can Brooklyn Data help?
Building market-basket analysis from scratch can be a slog. You need to do data prep, metric math, rule filtering, validation, and finally turn results into something your business users can act on.
If your data already lives in Snowflake, our MBA Snowflake Native App removes most of that grunt work:
- Installs from the Snowflake Marketplace in minutes.
- Reads directly from your transactions table; no extracts, no data movement.
- Auto-generates Support, Confidence, Lift, Leverage, and Conviction for item sets that pass your thresholds.
- Outputs clean result tables you can feed to dashboards, campaigns, or an LLM for narrative insights.
By handling the repetitive plumbing, the app lets you and our consulting team focus on the fun parts, like custom tailoring the rules, validating with A/B tests, and activating the findings as new marketing or merchandising workflows.