Reputation: 551
I have this Pandas dataframe:
I want a new DF to group them by ['ticked_id','time_a'] and add a new column with the min difference in time (hh), SQL code that works:
SELECT ticket_id, DATEDIFF('hh', time_a, MIN(time_b)) each_diff from ...
I've tried to group them but it results on an object that I can't see
Upvotes: 0
Views: 1168
Reputation: 1151
To group the data and get a column with the minimum date of the time_b
column you can do:
df_grouped = df.groupby(['ticket_id', 'time_a'])['time_b'].min().reset_index()
I don't know the datatypes of your time_a
and time_b
columns but if they are timestamps you can then do the following to get the difference in hours:
df_grouped['each_diff'] = (df_grouped['time_b'] - df_grouped['time_a').astype('timedelta64[h]')
Upvotes: 1
Reputation: 11321
For
df = pd.DataFrame({
'ticket_id': [1, 2, 2],
'time_a': ['2021-07-21 12:00:01', '2021-07-21 12:00:01', '2021-07-21 12:00:01'],
'time_b': ['2021-07-21 14:00:02', '2021-07-21 13:00:05', '2021-07-21 17:00:10']
})
df.time_a = pd.to_datetime(df.time_a)
df.time_b = pd.to_datetime(df.time_b)
ticket_id time_a time_b
0 1 2021-07-21 12:00:01 2021-07-21 14:00:02
1 2 2021-07-21 12:00:01 2021-07-21 13:00:05
2 2 2021-07-21 12:00:01 2021-07-21 17:00:10
this
df = df.groupby(['ticket_id', 'time_a'], as_index=False).agg(time_b_min=('time_b', 'min'))
df['diff'] = df.time_b_min - df.time_a
gives you
ticket_id time_a time_b_min diff
0 1 2021-07-21 12:00:01 2021-07-21 14:00:02 0 days 02:00:01
1 2 2021-07-21 12:00:01 2021-07-21 13:00:05 0 days 01:00:04
Upvotes: 1