noob
noob

Reputation: 3811

Dynamic Dates difference calculation Pandas

 customer_id    Order_date 
        1       2015-01-16      
        1       2015-01-19      
        2       2014-12-21      
        2       2015-01-10      
        1       2015-01-10
        3       2018-01-18
        3       2017-03-04
        4       2019-11-05
        4       2010-01-01
        3       2019-02-03      

Lets say I have data like this

Basically for an ecommerce firm some people buy regularly, some buy once every year, some buy monthly once etc. I need to find the difference between frequency of each transaction for each customer.

This will be a dynamic list, since some people will have transacted thousand times, some would have transacted once, some ten times etc. Any ideas on how to achieve this.

Output needed:

   customer_id  Order_date_Difference_in_days 
        1       6,3  #Difference b/w first 2 dates 2015-01-10 and 2015-01-16 
                     #is 6 days and diff b/w next 2 consecutive dates is                                                   
                     #2015-01-16 and 2015-01-19 is #3 days   
        2       20      
        3       320,381
        4       3596

Basically these are the differences between dates after sorting them first for each customer id

Upvotes: 3

Views: 175

Answers (2)

Umar.H
Umar.H

Reputation: 23099

First we need to sort the data by customer id and the order date

ensure your datetime is a proper date time call df['Order_date'] = pd.to_datetime(df['Order_date'])

df.sort_values(['customer_id','Order_date'],inplace=True)

df["days"] = df.groupby("customer_id")["Order_date"].apply(
    lambda x: (x - x.shift()) / np.timedelta64(1, "D")
)

print(df)
  customer_id Order_date    days
4            1 2015-01-10     NaN
0            1 2015-01-16     6.0
1            1 2015-01-19     3.0
2            2 2014-12-21     NaN
3            2 2015-01-10    20.0
6            3 2017-03-04     NaN
5            3 2018-01-18   320.0
9            3 2019-02-03   381.0
8            4 2010-01-01     NaN
7            4 2019-11-05  3595.0

then you can do a simple agg but you'll need to conver the value into a string.

df.dropna().groupby("customer_id")["days"].agg(
    lambda x: ",".join(x.astype(str))
).to_frame()
                    days
customer_id             
1                6.0,3.0
2                   20.0
3            320.0,381.0
4                 3595.0

Upvotes: 3

anky
anky

Reputation: 75080

You can also use the below for the current output:

m=(df.assign(Diff=df.sort_values(['customer_id','Order_date'])
    .groupby('customer_id')['Order_date'].diff().dt.days).dropna())

m=m.assign(Diff=m['Diff'].astype(str)).groupby('customer_id')['Diff'].agg(','.join)

customer_id
1        6.0,3.0
2           20.0
3    320.0,381.0
4         3595.0
Name: Diff, dtype: object

Upvotes: 4

Related Questions