Reputation: 113
How to compute a variable (through an easy function) which is a row_counter of sorts.
For instance, suppose i have a data which is at Customer_ID level and has all transactions for each customer_id. I want a field created, which sorts transactions from earliest to latest data and assigns a counter variable
I am doing it slightly convoluted manner like below
cust_df['row_count'] = 1
cust_df = cust_df.sort_values(['cust_id', 'transaction_date'])
cust_trans_num = cust_df .groupby(['cust_id', 'transaction_number'])['row_count'].count().reset_index()
cust_trans_num = cust_trans_num .groupby(by=['cust_id', 'transaction_number']).sum().groupby(
level=[0]).cumsum().reset_index().rename(columns={'row_count': 'cust_trans_number'})
cust_df = pd.merge(cust_df , cust_trans_num , how='left', on=['cust_id', 'transaction_number'])
Input data is like below:
cust_id transaction_number transaction_date
1 ABC1 1/1/2018
1 DEF3 12/31/2017
1 XYZ2 4/1/2017
Output data should be like below:
cust_id transaction_number transaction_date cust_trans_number
1 XYZ2 4/1/2017 1
1 DEF3 12/31/2017 2
1 ABC1 1/1/2018 3
Let me know if there is a one-line solution to this?
Upvotes: 1
Views: 97
Reputation: 42916
Use:
pd.to_datetime
to convert your column to datettime typecust_id
& transaction_date
GroupBy.cumcount
df['transaction_date'] = pd.to_datetime(df['transaction_date'])
df = df.sort_values(['cust_id', 'transaction_date'])
df['cust_trans_number'] = df.groupby('cust_id').cumcount().add(1)
Output
cust_id transaction_number transaction_date cust_trans_number
2 1 XYZ2 2017-04-01 1
1 1 DEF3 2017-12-31 2
0 1 ABC1 2018-01-01 3
Upvotes: 1