Reputation: 484
How would you do the following?
For customers with churn=1, take the average of their last 3 purchases based on the month they leave. E.g. Churn_Month=3, then average last 3 purchases: from Mar, Feb and Jan if available. Sometimes would be 2 or 1 purchase.
For customers with churn=0, take the average of their last 3 purchases when available, sometimes would be 2 or 1 purchase.
And put it all in one pandas Dataframe. See Expected Output
Available information
df1: Here you'll find transactions with customer id, date, purchase1 and purchase2.
ID DATE P1 P2
0 1 2003-04-01 449 55
1 4 2003-02-01 406 213
2 3 2003-11-01 332 372
3 1 2003-03-01 61 336
4 3 2003-10-01 428 247
5 3 2003-12-01 335 339
6 3 2003-09-01 367 41
7 2 2003-01-01 11 270
8 1 2003-01-01 55 102
9 2 2003-02-01 244 500
10 1 2003-02-01 456 272
11 5 2003-03-01 240 180
12 4 2002-12-01 156 152
13 5 2003-01-01 144 185
14 4 2003-01-01 246 428
15 1 2003-05-01 492 97
16 5 2003-02-01 371 66
17 5 2003-04-01 246 428
18 5 2003-05-01 406 213
df2: Here you'll find customer ID, whether they leave the company or not and the month when they left (E.g. 3.0 = March)
ID Churn Churn_Month
0 1 1 3.0
1 2 0 0.0
2 3 1 12.0
3 4 0 0.0
4 5 1 4.0
Expected Output:
Mean of P1 and P2 by ID, merged with df2 information. ID will be the new index.
ID P1 P2 Churn Churn_Month
1 190.6 236.6 1 3.0
2 127.5 385 0 0.0
3 365 319.3 1 12.0
4 269.3 264.3 0 0.0
5 285.6 224.6 1 4.0
Upvotes: 0
Views: 154
Reputation: 3285
Some extra details were necessary here. First, when Churn == 1
assume that the customer left. Using df2
you can determine which month they left and remove any data that occurred after. From there it's pretty straight forward in terms of grouping, aggregating, and filter the data.
# merge
df3 = df1.merge(df2)
# convert DATE to datetime
df3.DATE = pd.to_datetime(df3.DATE)
# filter rows where month of (DATE is <= Churn_Month and Churn == 1)
# or Churn == 0
df3 = df3.loc[
((df3.Churn == 1) & (df3.DATE.dt.month <= df3.Churn_Month)) |
(df3.Churn == 0)
].copy()
# sort values ascending
df3.sort_values([
'ID',
'DATE',
'P1',
'P2',
'Churn',
'Churn_Month'
], inplace=True)
# groupby ID, Churn
# take last 3 DATEs
# merge with original to filter rows
# group on ID, Churn, and Churn_Month
# average P1 and P2
# reset_index to get columns back
# round results to 1 decimal at the end
df3.groupby([
'ID',
'Churn'
]).DATE.nth([
-1, -2, -3
]).reset_index().merge(df3).groupby([
'ID',
'Churn',
'Churn_Month'
])[[
'P1',
'P2'
]].mean().reset_index().round(1)
Results
ID Churn Churn_Month P1 P2
0 1 1 3.0 190.7 236.7
1 2 0 0.0 127.5 385.0
2 3 1 12.0 365.0 319.3
3 4 0 0.0 269.3 264.3
4 5 1 4.0 285.7 224.7
Upvotes: 1