calculating amount total and frequency of transactions before a certain date column

Question

I want to calculate:

the number of non-null months each customer has transacted BEFORE he/she subscribes (frequency)
the total transaction amount before a certain date column (monetary)

Initial dataframe

ad = {'customer':['Clark','Stones','Fay','Stones','Clark','Clark','Clark','Fay','Stones'],
    'subscribe_date':['2020-11-30','2020-07-01','2020-01-02','2020-07-01','2020-11-30','2020-11-30','2020-11-30',
                     '2020-01-02','2020-07-01'],
    'trx_month':['2020-12-01','2020-07-01','2020-07-01','2021-03-01','2021-02-01','2020-09-01','2020-11-01',
               '2020-08-01','2018-02-01'],
    'trx_amount':[100,90,50,45,20,30,50,80,200],
    }
ad = pd.DataFrame(ad)
ad = ad.sort_values(by=['customer','trx_month'])

Expected dataframe (BEFORE)

  ad2 = {'customer':['Clark','Stones','Fay'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
      'frequency':[2,1,np.NaN], # number of months the customers transacted before the subscribe_date
      'monetary':[80,200,np.NaN]} #sum of trx_amount before the subscribe_date
ad2 = pd.DataFrame(ad2)
ad2

ad3 = {'customer':['Clark','Stones','Fay'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
      'frequency':[2,1,2], # number of months the customers transacted before the subscribe_date
      'monetary':[120,45,130]} #sum of trx_amount before the subscribe_date
ad3 = pd.DataFrame(ad3)
ad3

Explanation: Clark subscribed on 30 Nov 2020. Prior to his subscription, he has transacted on September and November 2020 (freq = 2) and the sum of those transactions were 80. After his subscription, he transacted again in December 2020 and February 2021 (freq =2, monetary=120)

Without taking the subscribe date into account, the frequency and monetary can be calculated using pandas groupby, but with the new restriction, I am confused.

It will be nice if the code is flexible to adjust to AFTER subscription (to compare before-after effect).

anky · Accepted Answer

IIUC, you can assign a conditional column based on the difference of subscribe date and trx date and then group by:

Convert date columns to datetime from strings (Ignore this block if already date)

ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])
ad['trx_month'] = pd.to_datetime(ad['trx_month'])

Then use:

d = {'count':'frequency','sum':'monetary'}
diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days

out = (ad.assign(Before_After=
np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
.groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
.rename(columns=d))

print(out)
                          frequency  monetary
customer Before_After                        
Clark    After                    2       120
         Before                   2        80
Fay      After                    2       130
Stones   After                    1        45
         Before                   1       200
         Subscribed_date          1        90

EDIT: Per your edit , you can create a dictionary with Before and After as Keys and have the respective dataframe as values

d = {'count':'frequency','sum':'monetary'}
diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days

out = (ad.assign(Before_After=
np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
.groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
.rename(columns=d)).unstack().swaplevel(axis=1)
final_dict = {i: out.loc[:,i] for i in out.columns.levels[0]}

print(final_dict['Before'],'

',final_dict["After"])

          frequency  monetary
customer                     
Clark           2.0      80.0
Fay             NaN       NaN
Stones          1.0     200.0 

           frequency  monetary
customer                     
Clark           2.0     120.0
Fay             2.0     130.0
Stones          1.0      45.0

calculating amount total and frequency of transactions before a certain date column

Answers (1)

Related Questions