Reputation: 197
My input dataframe is something like this: here for every company we can have multiple salesid and each salesid has unique create date.
CompanyName Salesid Create Date
ABC 1 1-1-2020
ABC 22 4-1-2020
ABC 3 15-1-2020
ABC 4 10-1-2020
XYZ 34 19-2-2020
XYZ 56 23-2-2020
XYZ 23 11-2-2020
XYZ 87 27-2-2020
XYZ 101 5-2-2020
I want to calculate the mean createdate gap for each company: I am expecting an output in this format:
Name Mean_createdate_gap
ABC 4.66
XYZ 5.5
explanation:
ABC => (3+6+5)/3 = 4.66 (cumulative diff between dates)
XYZ => (6+8+4+4)/4 = 5.5
For this first, we may need to sort the data followed by grouping by companyname. I am not sure how I suppose to implement it.
Upvotes: 2
Views: 245
Reputation: 10960
Covert Create
column to datetime
df['Create'] = pd.to_datetime(df['Create'], format='%d-%m-%Y')
Sort by this column
df = df.sort_values(by=['Create'])
Do a groupby aggregate with cummulative differenced mean
df.groupby('CompanyName')['Create'].agg(lambda x: x.diff().abs().mean())
CompanyName
ABC 4 days 16:00:00
XYZ 5 days 12:00:00
Name: Create, dtype: timedelta64[ns]
Upvotes: 2
Reputation: 5012
Here you go:
df['Create Date'] = pd.to_datetime(df['Create Date'], format='%d-%m-%Y')
res = df.sort_values(by='Create Date')\
.groupby('CompanyName', sort=False)['Create Date']\
.agg(lambda cd : cd.diff().map(lambda dt: dt.days).mean()).reset_index()\
.rename(columns={'CompanyName': 'Name', 'Create Date': 'Mean_createdate_gap'})
print(res)
Output
Name Mean_createdate_gap
0 ABC 4.666667
1 XYZ 5.500000
Upvotes: 2