Reputation: 747
I want to calculate:
Initial dataframe
ad = {'customer':['Clark','Stones','Fay','Stones','Clark','Clark','Clark','Fay','Stones'],
'subscribe_date':['2020-11-30','2020-07-01','2020-01-02','2020-07-01','2020-11-30','2020-11-30','2020-11-30',
'2020-01-02','2020-07-01'],
'trx_month':['2020-12-01','2020-07-01','2020-07-01','2021-03-01','2021-02-01','2020-09-01','2020-11-01',
'2020-08-01','2018-02-01'],
'trx_amount':[100,90,50,45,20,30,50,80,200],
}
ad = pd.DataFrame(ad)
ad = ad.sort_values(by=['customer','trx_month'])
Expected dataframe (BEFORE)
ad2 = {'customer':['Clark','Stones','Fay'],
'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
'frequency':[2,1,np.NaN], # number of months the customers transacted before the subscribe_date
'monetary':[80,200,np.NaN]} #sum of trx_amount before the subscribe_date
ad2 = pd.DataFrame(ad2)
ad2
ad3 = {'customer':['Clark','Stones','Fay'],
'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
'frequency':[2,1,2], # number of months the customers transacted before the subscribe_date
'monetary':[120,45,130]} #sum of trx_amount before the subscribe_date
ad3 = pd.DataFrame(ad3)
ad3
Explanation: Clark subscribed on 30 Nov 2020. Prior to his subscription, he has transacted on September and November 2020 (freq = 2) and the sum of those transactions were 80. After his subscription, he transacted again in December 2020 and February 2021 (freq =2, monetary=120)
Without taking the subscribe date into account, the frequency and monetary can be calculated using pandas groupby, but with the new restriction, I am confused.
It will be nice if the code is flexible to adjust to AFTER subscription (to compare before-after effect).
Upvotes: 1
Views: 433
Reputation: 75080
IIUC, you can assign a conditional column based on the difference of subscribe date and trx date and then group by:
Convert date columns to datetime from strings (Ignore this block if already date)
ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])
ad['trx_month'] = pd.to_datetime(ad['trx_month'])
Then use:
d = {'count':'frequency','sum':'monetary'}
diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days
out = (ad.assign(Before_After=
np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
.groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
.rename(columns=d))
print(out)
frequency monetary
customer Before_After
Clark After 2 120
Before 2 80
Fay After 2 130
Stones After 1 45
Before 1 200
Subscribed_date 1 90
EDIT: Per your edit , you can create a dictionary with Before
and After
as Keys and have the respective dataframe as values
d = {'count':'frequency','sum':'monetary'}
diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days
out = (ad.assign(Before_After=
np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
.groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
.rename(columns=d)).unstack().swaplevel(axis=1)
final_dict = {i: out.loc[:,i] for i in out.columns.levels[0]}
print(final_dict['Before'],'\n\n',final_dict["After"])
frequency monetary
customer
Clark 2.0 80.0
Fay NaN NaN
Stones 1.0 200.0
frequency monetary
customer
Clark 2.0 120.0
Fay 2.0 130.0
Stones 1.0 45.0
Upvotes: 1