Reputation: 63
I'm trying to add a "Monthly orders" column which calculates how many transactions a customer with specific id had within a specific CohortDate.
Basically, it's a COUNTIFS function where RANGES are all IDS and all CohortDates equal to ID and CohortDate of any given row.
Any help is much appreciated.
import pandas as pd
import numpy as np
df = pd.DataFrame({'order_id': [75054,75057,75059,75061,75066],
'customer_id': [101692,101694,101734,101692,101694],
'CohortDate': ['2016-05','2016-05','2016-05','2016-05','2016-06']
})
The result I would aim to get is the following:
order_id customer_id CohortDate Monthly_orders
75054 101692 '2016-05' 2
75057 101694 '2016-05' 1
75059 101734 '2016-05' 1
75061 101692 '2016-05' 2
75066 101694 '2016-06' 1
Upvotes: 1
Views: 93
Reputation: 63
to group by certain variables, we can use the transform
which applies the groupby accross a series rather than returning a new dataframe.
df.groupby(['customer_id','CohortDate'])['customer_id'].transform('count')
this returns the count across the original data frame.
order_id customer_id CohortDate count
0 75054 101692 2016-05 2
1 75057 101694 2016-05 1
2 75059 101734 2016-05 1
3 75061 101692 2016-05 2
4 75066 101694 2016-06 1
Upvotes: 2