Kaiser
Kaiser

Reputation: 197

How to Calculate mean gap between dates for a particular id in python?

My input dataframe is something like this: here for every company we can have multiple salesid and each salesid has unique create date.

CompanyName Salesid  Create Date
ABC         1        1-1-2020 
ABC         22       4-1-2020
ABC         3        15-1-2020
ABC         4        10-1-2020
XYZ         34       19-2-2020
XYZ         56       23-2-2020
XYZ         23       11-2-2020
XYZ         87       27-2-2020
XYZ         101      5-2-2020

I want to calculate the mean createdate gap for each company: I am expecting an output in this format:

Name   Mean_createdate_gap
ABC    4.66 
XYZ    5.5    

explanation:

 ABC =>  (3+6+5)/3 = 4.66 (cumulative diff between dates)
 XYZ =>  (6+8+4+4)/4 = 5.5

For this first, we may need to sort the data followed by grouping by companyname. I am not sure how I suppose to implement it.

Upvotes: 2

Views: 245

Answers (2)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Covert Create column to datetime

df['Create'] = pd.to_datetime(df['Create'], format='%d-%m-%Y')

Sort by this column

df = df.sort_values(by=['Create'])

Do a groupby aggregate with cummulative differenced mean

df.groupby('CompanyName')['Create'].agg(lambda x: x.diff().abs().mean())

CompanyName
ABC   4 days 16:00:00
XYZ   5 days 12:00:00
Name: Create, dtype: timedelta64[ns]

Upvotes: 2

Balaji Ambresh
Balaji Ambresh

Reputation: 5012

Here you go:

df['Create Date'] = pd.to_datetime(df['Create Date'], format='%d-%m-%Y')
res = df.sort_values(by='Create Date')\
    .groupby('CompanyName', sort=False)['Create Date']\
    .agg(lambda cd : cd.diff().map(lambda dt: dt.days).mean()).reset_index()\
    .rename(columns={'CompanyName': 'Name', 'Create Date': 'Mean_createdate_gap'})
print(res)

Output

  Name  Mean_createdate_gap
0  ABC             4.666667
1  XYZ             5.500000

Upvotes: 2

Related Questions