Reputation: 3158
Having a dataframe like that:
Desirable result is to get aggregated IDs with time diffs between Start and End looking like that:
Tried simple groupings and diffs but it does not work:
df[df['Name'] == 'Start'].groupby('ID')['Time']-\
df[df['Name'] == 'End'].groupby('ID')['Time']
How this task can be done in pandas? Thanks!
Upvotes: 0
Views: 63
Reputation: 389
Here you go.
Generate data:
df = pd.DataFrame({'ID':[1, 1,2, 2],
'Name': ['Start', 'End', 'Start', 'End'],
'Time': [pd.datetime(2020, 1,1,0,1,0), pd.datetime(2020, 1,2,0,0,0),
pd.datetime(2020, 1,1,0,0,0), pd.datetime(2020, 1,2,0,0,0)]})
Get TimeDelta:
df_agg = df[df['Name'] == 'Start'].reset_index()[['ID', 'Time']]
df_agg = df_agg.rename(columns={"Time": "Start"})
df_agg['End'] = df[df['Name'] == 'End'].reset_index()['Time']
df_agg['TimeDelta'] = df_agg['End'] - df_agg['Start']
Get timediff as decimal value in days, like your example:
df_agg['TimeDiff_days'] = df_agg['TimeDelta'] / np.timedelta64(1,'D')
df_agg
Result:
ID Start End TimeDelta TimeDiff_days
0 1 2020-01-01 00:01:00 2020-01-02 0 days 23:59:00 0.999306
1 2 2020-01-01 00:00:00 2020-01-02 1 days 00:00:00 1.000000
Upvotes: 1
Reputation: 304
A possible solution is to join the table on itself like this:
df_start = df[df['Name'] == 'Start']
df_end = df[df['Name'] == 'End']
df_merge = df_start.merge(df_end, on='id', suffixes=('_start', '_end'))
df_merge['diff'] = df_merge['Time_end'] - df_merge['Time_start']
print(df_merge.to_string())
Output:
id Name_start Time_start Name_end Time_end diff
0 1 Start 2017-11-02 12:00:14 End 2017-11-07 22:45:13 5 days 10:44:59
1 2 Start 2018-01-28 06:53:09 End 2018-02-05 13:31:14 8 days 06:38:05
Upvotes: 1