tumir
tumir

Reputation: 49

find time difference between groups in dataframe with python

I'm using python's pandas.

I'm having the following orders dataframe. when each order have its order id, order time and different items id in the order. in this example I have three different groups - A,B,C:

  order_id                 time  item_id
0        A  2022-11-10 08:43:07        1
1        A  2022-11-10 08:43:07        2
2        A  2022-11-10 08:43:07        3
3        B  2022-11-10 08:46:27        1
4        B  2022-11-10 08:46:27        2
5        C  2022-11-10 08:58:45        3

I want to calculate the time difference between group A and B and then between group B and C, by the time order and save the result into another column

wanted result:

 order_id                 time  item_id        time_diff
0        A  2022-11-10 08:43:07        1                 
1        A  2022-11-10 08:43:07        2                 
2        A  2022-11-10 08:43:07        3                 
3        B  2022-11-10 08:46:27        1  0 days 00:03:20
4        B  2022-11-10 08:46:27        2  0 days 00:03:20
5        C  2022-11-10 08:58:45        3  0 days 00:12:18

how can I calculate the time difference between the groups when the time is similar for the entire group?

try using .diff() but I got only the difference inside the group:

df['time_diff'] = df.groupby('order_id')['time'].diff()

df
Out[141]: 
  order_id                time  item_id time_diff
0        A 2022-11-10 08:43:07        1       NaT
1        A 2022-11-10 08:43:07        2    0 days
2        A 2022-11-10 08:43:07        3    0 days
3        B 2022-11-10 08:46:27        1       NaT
4        B 2022-11-10 08:46:27        2    0 days
5        C 2022-11-10 08:58:45        3       NaT

I want the difference between the groups and not inside. I can calculate the difference with .last().diff() but I don't know how to save it as a column back to the dataframe:

df.groupby('order_id')['time'].last().diff().to_frame('time_diff')
Out[]: 
               time_diff
order_id                
A                    NaT
B        0 days 00:03:20
C        0 days 00:12:18

thanks

Upvotes: 0

Views: 53

Answers (1)

gtomer
gtomer

Reputation: 6564

You were on the right track. This will work for you:

diff = df.groupby('order_id')['time'].last().diff().to_frame('time_diff').reset_index()
df = df.merge(diff, on='order_id', how='left')
df

Upvotes: 1

Related Questions