Bob Harris
Bob Harris

Reputation: 87

Calculate repeat purchase probability

I want to calculate repeat purchase probability based on the following example. I want to run this calculation on a Pandas DataFrame and I'm having a hard time figuring out how to do this incrementally for each row.

Example:

In this example, the repeat order probability of customers who have placed one order in their lifetime to placing a second order is: (350 + 130) / (350+130+1,000) = 32.43%

Here's an example using iloc to get the same result as the example but it does not iterate through the dataframe.

lst = [[1, 1000], [2, 350],
       [3, 130], [4,70], [5,30]]
    
df = pd.DataFrame(lst, columns =['order', 'order_total'])
df['repeat_purchase_prob'] = df['order_total'].iloc[1:].sum() / df['frequency_total'].iloc[0:].sum()

And here's a link to the post I've read: https://blog.digitalj2.com/understand-the-power-of-repeat-order-probability

The end result should look like this:

order order_total repeat_purchase_prob
1 1000 0.37
2 350 0.40
3 130 0.43

And so on..

Upvotes: 2

Views: 596

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35626

Try a reversed cumsum then shift and divide:

rcs = df.loc[::-1, 'order_total'].cumsum()[::-1]
df['repeat_purchase_prob'] = rcs.shift(-1) / rcs

df:

   order  order_total  repeat_purchase_prob
0      1         1000              0.367089
1      2          350              0.396552
2      3          130              0.434783
3      4           70              0.300000
4      5           30                   NaN

Optional fill NaN with 0 via div:

df['repeat_purchase_prob'] = rcs.shift(-1).div(rcs, fill_value=0)

df:

   order  order_total  repeat_purchase_prob
0      1         1000              0.367089
1      2          350              0.396552
2      3          130              0.434783
3      4           70              0.300000
4      5           30              0.000000

Complete Working Example:

import pandas as pd

lst = [[1, 1000], [2, 350], [3, 130], [4, 70], [5, 30]]

df = pd.DataFrame(lst, columns=['order', 'order_total'])

rcs = df.loc[::-1, 'order_total'].cumsum()[::-1]
df['repeat_purchase_prob'] = rcs.shift(-1) / rcs

print(df)

Upvotes: 1

Related Questions