Reputation: 3811
customer_id Order_date
1 2015-01-16
1 2015-01-19
2 2014-12-21
2 2015-01-10
1 2015-01-10
3 2018-01-18
3 2017-03-04
4 2019-11-05
4 2010-01-01
3 2019-02-03
Lets say I have data like this
Basically for an ecommerce firm some people buy regularly, some buy once every year, some buy monthly once etc. I need to find the difference between frequency of each transaction for each customer.
This will be a dynamic list, since some people will have transacted thousand times, some would have transacted once, some ten times etc. Any ideas on how to achieve this.
Output needed:
customer_id Order_date_Difference_in_days
1 6,3 #Difference b/w first 2 dates 2015-01-10 and 2015-01-16
#is 6 days and diff b/w next 2 consecutive dates is
#2015-01-16 and 2015-01-19 is #3 days
2 20
3 320,381
4 3596
Basically these are the differences between dates after sorting them first for each customer id
Upvotes: 3
Views: 175
Reputation: 23099
First we need to sort the data by customer id and the order date
ensure your datetime is a proper date time call df['Order_date'] = pd.to_datetime(df['Order_date'])
df.sort_values(['customer_id','Order_date'],inplace=True)
df["days"] = df.groupby("customer_id")["Order_date"].apply(
lambda x: (x - x.shift()) / np.timedelta64(1, "D")
)
print(df)
customer_id Order_date days
4 1 2015-01-10 NaN
0 1 2015-01-16 6.0
1 1 2015-01-19 3.0
2 2 2014-12-21 NaN
3 2 2015-01-10 20.0
6 3 2017-03-04 NaN
5 3 2018-01-18 320.0
9 3 2019-02-03 381.0
8 4 2010-01-01 NaN
7 4 2019-11-05 3595.0
then you can do a simple agg but you'll need to conver the value into a string.
df.dropna().groupby("customer_id")["days"].agg(
lambda x: ",".join(x.astype(str))
).to_frame()
days
customer_id
1 6.0,3.0
2 20.0
3 320.0,381.0
4 3595.0
Upvotes: 3
Reputation: 75080
You can also use the below for the current output:
m=(df.assign(Diff=df.sort_values(['customer_id','Order_date'])
.groupby('customer_id')['Order_date'].diff().dt.days).dropna())
m=m.assign(Diff=m['Diff'].astype(str)).groupby('customer_id')['Diff'].agg(','.join)
customer_id
1 6.0,3.0
2 20.0
3 320.0,381.0
4 3595.0
Name: Diff, dtype: object
Upvotes: 4