Arthi
Arthi

Reputation: 45

Pandas Dataframe calculate Time difference for each group and Time difference between two different groups

I have created a dataframe like that:

import pandas as pd
d = {'Time': ['01.07.2019, 06:21:33', '01.07.2019, 06:32:01', '01.07.2019, 06:57:33', '01.07.2019, 07:24:33','01.07.2019, 08:26:25', '01.07.2019, 09:12:44']
     ,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
     ,'Name': ['Bayer', 'Bayer', 'ITM', 'ITM', 'Geco' , 'Geco'],
               'Group': ['1', '1', '2','2','3','3']}
df = pd.DataFrame(data=d)

output:

    Time                    Action  Name    Group
0   01.07.2019, 06:21:33    Opened  Bayer   1
1   01.07.2019, 06:32:01    Closed  Bayer   1
2   01.07.2019, 06:57:33    Opened  ITM     2
3   01.07.2019, 07:24:33    Closed  ITM     2
4   01.07.2019, 08:26:25    Opened  Geco    3
5   01.07.2019, 09:12:44    Closed  Geco    3

so now i'm trying to calculate the Time difference for each group and the Time difference between these groups in minutes. So For example the Time difference in the Group Bayer should be 10 minutes and 28 seconds and the Time difference between Bayer and ITM should be 25 minutes and 32 seconds. After that the Time difference between the same group should be displayed in a column at the same line where the group begins and the Time difference between two different groups should be displayed in another column at the same line where the group ends.

so the wished Output would be:

    Time                    Action  Name    Group Time Difference(names) Time Difference(groups)
0   01.07.2019, 06:21:33    Opened  Bayer   1          10:28
1   01.07.2019, 06:32:01    Closed  Bayer   1                                   25:32
2   01.07.2019, 06:57:33    Opened  ITM     2          27:00         
3   01.07.2019, 07:24:33    Closed  ITM     2                                   1:01:52
4   01.07.2019, 08:26:25    Opened  Geco    3          46:19
5   01.07.2019, 09:12:44    Closed  Geco    3

how could i do that?

Upvotes: 1

Views: 191

Answers (1)

piterbarg
piterbarg

Reputation: 8219

Start by making datetime from string, then some groupbys and diffs:

df["Time"] = pd.to_datetime(df["Time"])
df["d1"] = df.groupby("Name")["Time"].diff().shift(-1).fillna("")
df["d2"] = (
    df.groupby((df["Action"] == "Closed").cumsum())["Time"]
    .diff()
    .shift(-1)
    .fillna("")
)

produces

|    | Time                | Action   | Name   |   Group | d1              | d2              |
|---:|:--------------------|:---------|:-------|--------:|:----------------|:----------------|
|  0 | 2019-01-07 06:21:33 | Opened   | Bayer  |       1 | 0 days 00:10:28 |                 |
|  1 | 2019-01-07 06:32:01 | Closed   | Bayer  |       1 |                 | 0 days 00:25:32 |
|  2 | 2019-01-07 06:57:33 | Opened   | ITM    |       2 | 0 days 00:46:19 |                 |
|  3 | 2019-01-07 07:24:33 | Closed   | ITM    |       2 |                 | 0 days 01:01:52 |
|  4 | 2019-01-07 08:26:25 | Opened   | Geco   |       3 | 0 days 00:27:00 |                 |
|  5 | 2019-01-07 09:12:44 | Closed   | Geco   |       3 |                 |                 |

to explain a bit the d2 calc, this (df['Action'] == 'Closed').cumsum() increments by 1 for each new 'Closed' row. Here I print it alongside Action for clarity, using this

df['d2_cond'] = (df['Action'] == 'Closed').cumsum()
df[['Action', 'd2_cond']]

prints


Action  d2_cond
0   Opened  0
1   Closed  1
2   Opened  1
3   Closed  2
4   Opened  2
5   Closed  3

so we can groupby on this list to put together each Closed with the corresponding next Opened

Upvotes: 3

Related Questions