This post is the third of a series of posts designed to walk startup founders through how we think about Customer Lifetime Value (CLV), how we calculate it (or avoid calculating it), and then how to use it for the benefit of their businesses. If you missed Part 1: The Unexamined Life is Not Worth Living and Part 2: Your Customers are Not Normal, we invite you to check them out as well.
Good News at the Early Stage: Less Math
Your most valuable customers are those with the most Recent purchases (R), the most Frequent purchases (F), and who spend the most Money per purchase (M). To use the shorthand, they have the strongest RFM scores.
Therefore, as a member of an early-stage startup team, you do not need to develop a predictive model or bother calculating CLV to get what you need at this stage of your business. Instead, allocate each customer into R, F, and M quantiles and save the predictive models for later. For the details on why this is, please see Part 2.
In this post, we will first explain why RFM bins serve as a proxy for residual customer value, or RCV, which is the portion of a customer’s CLV that has yet to be realized (see Part 1). Then we will discuss how to allocate customers to bins. In our next installment, we will discuss what to do with that information once you have it.
Observing Customer Behavior
To understand why RFM can be used to identify customers with the highest value, let’s start by looking at some customer purchase histories. In this example, the company is selling snacks to businesses. In the charts below, each row is a customer (business) and each dot is a purchase plotted along a timeline. The size of the dot represents the amount of money paid. After running a predictive model to calculate expected future RCV, the customers are sorted in descending order from the highest RCV at the top.
The first customer we zoom in on in the chart above has seven months of pretty consistent history of spending $15 on snacks about two times per week. The most recent purchase was two days ago. The predictive model predicts that this customer’s RCV is worth $3,362 in today’s dollars.
Our next customer also bought snacks two days ago and spends about the same amount ($14) on each purchase, but this business typically only buys once every three weeks or so. The predictive model forecasts that this customer is worth another $1,164 in today’s dollars. This RCV is a third of what the more frequently purchasing customer is worth above; the only meaningful difference is the purchase frequency.
The last customer in this comparison is like the previous one in terms of frequency and monetary value. However, this business hasn’t purchased any snacks in 67 days. Since it had been purchasing every other week, the likelihood that this customer has churned is high. Perhaps they found another snack provider or decided to go all-in on intermittent fasting. For this reason, the predictive model expects just $365 in future value from this customer–one-tenth of the first customer’s RCV, and one-third of the last customer’s RCV.
Generalizing Customer Behavior
Looking at customer behavior leads to the following general observations:
- While a customer is active, she is randomly buying around some purchase rate
- Customers have heterogeneous shopping habits, both in purchase rates and monetary amounts
- Some customers may already be lost, never to return
- Not all customers are equally committed to the product/service
In short, customers are complex, messy, and heterogeneous. Calculating CLV helps us celebrate that heterogeneity and use it to our advantage.
To turn these generalizations into something more mathematical, we like how Zodiac Metrics expresses CLV as a function of the following high-level factors:
The math required to calculate each of the predictive factors (inside the brackets) requires more effort than an early-stage startup has time for. Instead, we can use the inputs for each of these factors to guide us. The customers that have the best inputs have the highest RCV.
So what are the inputs? RFM!
- Churn Risk is a function of recency and frequency
- Purchase Propensity is also derived from recency and frequency, but in a different way
- Spending Habits are a function of frequency and monetary value
- (Historical Value is the sum of the customer’s transactions to date, net of variable costs)
This is the great news: if we can just figure out which customers have the best RFM scores, we identify our best customers.
Steps to Create RFM Quantiles
1. Transform Raw Transactional Data to an RFM Summary by User
Gather the transactions from your shopping cart export, Amplitude, Mixpanel, Stripe transactions, or other source that identifies individual customers, dates, and purchase amounts (net of variable costs). Then use a pivot table, SQL query, or script to aggregate the transactions by user. The result will look like the summary on the right, below.
2. Figure Out What Size Bins You Want
For RFM we want to divide customers into equally sized bins. It helps to stay between 3 bins (“terciles”) and 5 bins (“quintiles”). You might need some trial and error to arrive at what works for you. The diagram below helps illustrate the concept of bins. If your number of customers does not divide evenly by the number of bins, some bins will be smaller by one. It won’t hurt the analysis.
One other note: for labeling purposes, we put the highest number on the strongest, most valuable bins. So if we have three recency bins (terciles), #3 contains the most recent customers and #1 contains the customers who purchased the longest time ago–with #2 in the middle of those two extremes.
3. Sort by Recency to Assign Customers to Recency Bins
Using Excel: Calculate the recency based on the days since the last purchase date in the data set and store this value in its own column. Then use Excel’s data sort to put that column into a ranked order to assign bins manually, or use Excel’s PERCENTILE function. If you choose the latter, it helps to set up a small table with the thresholds for each bin and use an IF function to determine the bin. This is what you would use for 4 quartile bins:
Using Python: Use the qcut function with the appropriate parameters for your bins. Here is one example for 4 quartile bins: rfm[‘recency_quantile’] = pd.qcut(rfm[‘real_recency’].rank(method=’first’, ascending=False), 4, labels=range(1,5)).astype(‘int’)
4. Repeat the Sort-Assign Process with Frequency and Monetary Value
Unlike recency, for which the lowest, most recent value is the “best,” for frequency and monetary value the higher the better. So you will need to adjust your spreadsheet or Python formulas to account for this. Otherwise, the process of allocating customers to these bins is the same as in Step 3.
5. Visualize with a Segmentation Matrix
Once you have all of your customers allocated to R, F, and M bins, it helps to summarize your findings in a chart. Presenting the information visually helps a team discuss strategies for what to do with the different groups and develop a common vocabulary for referring to various behavioral segments. There is no one “right” way to present this information. We suggest some trial-and-error as you figure out what works for your team.
Here’s one standard way that brings together R, F, and M terciles into one matrix, with the higher customer counts in the darker blue:
If you use more than three bins, presenting all three variables–R, F, and M–can make for a very busy diagram. To simplify, many teams opt for RF matrices. This is especially useful when there is not a broad range of average order size. Here is an example of a 4×4 RF matrix:
Here’s a 5×5 RF matrix that looks similar, but includes more specific segment names:
In this installment we have seen both why you would want to segment your customers into groupings using RFM bins, and how to do it from a set of raw transactional data. The RFM bins help structure our understanding of a heterogeneous customer base by grouping them according to their purchasing behavior.