Luc
Luc

Reputation: 747

calculating amount total and frequency of transactions before a certain date column

I want to calculate:

  1. the number of non-null months each customer has transacted BEFORE he/she subscribes (frequency)
  2. the total transaction amount before a certain date column (monetary)

Initial dataframe

ad = {'customer':['Clark','Stones','Fay','Stones','Clark','Clark','Clark','Fay','Stones'],
    'subscribe_date':['2020-11-30','2020-07-01','2020-01-02','2020-07-01','2020-11-30','2020-11-30','2020-11-30',
                     '2020-01-02','2020-07-01'],
    'trx_month':['2020-12-01','2020-07-01','2020-07-01','2021-03-01','2021-02-01','2020-09-01','2020-11-01',
               '2020-08-01','2018-02-01'],
    'trx_amount':[100,90,50,45,20,30,50,80,200],
    }
ad = pd.DataFrame(ad)
ad = ad.sort_values(by=['customer','trx_month'])

Expected dataframe (BEFORE)

  ad2 = {'customer':['Clark','Stones','Fay'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
      'frequency':[2,1,np.NaN], # number of months the customers transacted before the subscribe_date
      'monetary':[80,200,np.NaN]} #sum of trx_amount before the subscribe_date
ad2 = pd.DataFrame(ad2)
ad2

ad3 = {'customer':['Clark','Stones','Fay'],
    'subscribe_date':['2020-11-30','2020-07-01','2021-01-02'],
      'frequency':[2,1,2], # number of months the customers transacted before the subscribe_date
      'monetary':[120,45,130]} #sum of trx_amount before the subscribe_date
ad3 = pd.DataFrame(ad3)
ad3

Explanation: Clark subscribed on 30 Nov 2020. Prior to his subscription, he has transacted on September and November 2020 (freq = 2) and the sum of those transactions were 80. After his subscription, he transacted again in December 2020 and February 2021 (freq =2, monetary=120)

Without taking the subscribe date into account, the frequency and monetary can be calculated using pandas groupby, but with the new restriction, I am confused.

It will be nice if the code is flexible to adjust to AFTER subscription (to compare before-after effect).

Upvotes: 1

Views: 433

Answers (1)

anky
anky

Reputation: 75080

IIUC, you can assign a conditional column based on the difference of subscribe date and trx date and then group by:


Convert date columns to datetime from strings (Ignore this block if already date)

ad['subscribe_date'] = pd.to_datetime(ad['subscribe_date'])
ad['trx_month'] = pd.to_datetime(ad['trx_month'])

Then use:

d = {'count':'frequency','sum':'monetary'}
diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days

out = (ad.assign(Before_After=
np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
.groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
.rename(columns=d))

print(out)
                          frequency  monetary
customer Before_After                        
Clark    After                    2       120
         Before                   2        80
Fay      After                    2       130
Stones   After                    1        45
         Before                   1       200
         Subscribed_date          1        90

EDIT: Per your edit , you can create a dictionary with Before and After as Keys and have the respective dataframe as values

d = {'count':'frequency','sum':'monetary'}
diff_ = ad['subscribe_date'].sub(ad['trx_month']).dt.days

out = (ad.assign(Before_After=
np.select([diff_<0,diff_>0],["After","Before"],"Subscribed_date"))
.groupby(['customer','Before_After'])['trx_amount'].agg(['count','sum'])
.rename(columns=d)).unstack().swaplevel(axis=1)
final_dict = {i: out.loc[:,i] for i in out.columns.levels[0]}

print(final_dict['Before'],'\n\n',final_dict["After"])

          frequency  monetary
customer                     
Clark           2.0      80.0
Fay             NaN       NaN
Stones          1.0     200.0 

           frequency  monetary
customer                     
Clark           2.0     120.0
Fay             2.0     130.0
Stones          1.0      45.0

Upvotes: 1

Related Questions