Reputation: 87
I want to calculate repeat purchase probability based on the following example. I want to run this calculation on a Pandas DataFrame and I'm having a hard time figuring out how to do this incrementally for each row.
Example:
In this example, the repeat order probability of customers who have placed one order in their lifetime to placing a second order is: (350 + 130) / (350+130+1,000) = 32.43%
Here's an example using iloc to get the same result as the example but it does not iterate through the dataframe.
lst = [[1, 1000], [2, 350],
[3, 130], [4,70], [5,30]]
df = pd.DataFrame(lst, columns =['order', 'order_total'])
df['repeat_purchase_prob'] = df['order_total'].iloc[1:].sum() / df['frequency_total'].iloc[0:].sum()
And here's a link to the post I've read: https://blog.digitalj2.com/understand-the-power-of-repeat-order-probability
The end result should look like this:
order | order_total | repeat_purchase_prob |
---|---|---|
1 | 1000 | 0.37 |
2 | 350 | 0.40 |
3 | 130 | 0.43 |
And so on..
Upvotes: 2
Views: 596
Reputation: 35626
Try a reversed cumsum
then shift
and divide:
rcs = df.loc[::-1, 'order_total'].cumsum()[::-1]
df['repeat_purchase_prob'] = rcs.shift(-1) / rcs
df
:
order order_total repeat_purchase_prob
0 1 1000 0.367089
1 2 350 0.396552
2 3 130 0.434783
3 4 70 0.300000
4 5 30 NaN
Optional fill NaN
with 0
via div
:
df['repeat_purchase_prob'] = rcs.shift(-1).div(rcs, fill_value=0)
df
:
order order_total repeat_purchase_prob
0 1 1000 0.367089
1 2 350 0.396552
2 3 130 0.434783
3 4 70 0.300000
4 5 30 0.000000
Complete Working Example:
import pandas as pd
lst = [[1, 1000], [2, 350], [3, 130], [4, 70], [5, 30]]
df = pd.DataFrame(lst, columns=['order', 'order_total'])
rcs = df.loc[::-1, 'order_total'].cumsum()[::-1]
df['repeat_purchase_prob'] = rcs.shift(-1) / rcs
print(df)
Upvotes: 1