How to do purchase-based segmentation


by Thomas Vladeck

Every company needs to segment their customer-base — it’s table stakes to ensure that your positioning is compelling to a particular audience.

Many B2C businesses have difficulty connecting external segmentation studies with their actual customer base. What’s worse, most segmentation studies are uninformative, because they use algorithms that cannot handle high dimensions well (believe me, I’ve run a few of these myself). But even if you run a bulletproof study, it’s often difficult to connect these learnings to your customer file.


If you want to be able to do a segmentation, today, and tie it back to your customer file, doing a purchase-based segmentation is a really convenient and easy place to start.


What can you learn? You can learn the clusters of products your customers typically buy together (in the sense that if a customer buys one SKU, there are other SKUs that they’re more likely to buy than the average customer), and how big each of these segments are.


What can you do? Once you know the different purchase-based segments, you can do a number of things. First, you can use this information to optimize your merchandising strategy, creating bundles of products that appeal to particular segments. At the customer level, you can cross reference purchase segments with CLV estimates and learn which of your purchase segments have the highest CLV. You may want to fire one or more of your segments, and double down on others.


How does it work? Well, you typically don’t have much demographic or interest-based information about your customers. Really, all you know is what they’ve purchased. But this is incredibly rich data, if you know how to use it.

The first thing you do is create a customer-by-sku matrix like the following:


Once you have this matrix set up, you can use an algorithm (actually, a family of algorithms), called non-negative matrix factorization (NMF), which takes this vary large matrix (as you have lots of customers and lots of products), and separates it into two smaller matrices — one each for your products and customers.


This information can be hard to interpret at first, but it ends up being substantially more useful than your original data. For every customer, instead of a long list of products they have bought or not-bought, you have a very short set of segment scores. And for each product, instead of a long list of customers that have either bought or not-bought, you have a very short set of segment scores.


You can think of the segment score as the “key” that links products and customers — when they line up, a customer is much more likely to make a purchase. With this information, the first things you can do are:

  • Group products into categories that you’ve learned from the data
  • Group customers into segments that you’ve learned from the data

Through other customer-level analyses you can…

  • Learn which purchase-based segments are more or less profitable
  • Learn which purchase-based segments come from which channels
  • Learn if some of your purchase based segments have evolved over time (has the composition of your customer-base changed over time?)

Through other product-based analyses, you can

  • Improve your site merchandising through smarter product groupings
  • Create smarter bundles and offers for customers

In summary,

Non-negative matrix factorization is one of my favorite all-purpose techniques for marketing science applications. It’s also the quickest way to deliver an actionable customer segmentation when all you have is purchase data.




targeting customer segmentation

Thomas Vladeck

Written by Thomas Vladeck

Tom was inspired to start Gradient by the cutting-edge market research performed by his advisors at Wharton, where he received his MBA in marketing and statistics. Prior to Wharton, Tom received a master’s degree from the London School of Economics and studied math at Pomona College. In a prior life, Tom produced quantitative models for global climate policy reports.