Reputation: 45
I have created a dataframe like that:
import pandas as pd
d = {'Time': ['01.07.2019, 06:21:33', '01.07.2019, 06:32:01', '01.07.2019, 06:57:33', '01.07.2019, 07:24:33','01.07.2019, 08:26:25', '01.07.2019, 09:12:44']
,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
,'Name': ['Bayer', 'Bayer', 'ITM', 'ITM', 'Geco' , 'Geco'],
'Group': ['1', '1', '2','2','3','3']}
df = pd.DataFrame(data=d)
output:
Time Action Name Group
0 01.07.2019, 06:21:33 Opened Bayer 1
1 01.07.2019, 06:32:01 Closed Bayer 1
2 01.07.2019, 06:57:33 Opened ITM 2
3 01.07.2019, 07:24:33 Closed ITM 2
4 01.07.2019, 08:26:25 Opened Geco 3
5 01.07.2019, 09:12:44 Closed Geco 3
so now i'm trying to calculate the Time difference for each group and the Time difference between these groups in minutes. So For example the Time difference in the Group Bayer should be 10 minutes and 28 seconds and the Time difference between Bayer and ITM should be 25 minutes and 32 seconds. After that the Time difference between the same group should be displayed in a column at the same line where the group begins and the Time difference between two different groups should be displayed in another column at the same line where the group ends.
so the wished Output would be:
Time Action Name Group Time Difference(names) Time Difference(groups)
0 01.07.2019, 06:21:33 Opened Bayer 1 10:28
1 01.07.2019, 06:32:01 Closed Bayer 1 25:32
2 01.07.2019, 06:57:33 Opened ITM 2 27:00
3 01.07.2019, 07:24:33 Closed ITM 2 1:01:52
4 01.07.2019, 08:26:25 Opened Geco 3 46:19
5 01.07.2019, 09:12:44 Closed Geco 3
how could i do that?
Upvotes: 1
Views: 191
Reputation: 8219
Start by making datetime from string, then some groupbys and diffs:
df["Time"] = pd.to_datetime(df["Time"])
df["d1"] = df.groupby("Name")["Time"].diff().shift(-1).fillna("")
df["d2"] = (
df.groupby((df["Action"] == "Closed").cumsum())["Time"]
.diff()
.shift(-1)
.fillna("")
)
produces
| | Time | Action | Name | Group | d1 | d2 |
|---:|:--------------------|:---------|:-------|--------:|:----------------|:----------------|
| 0 | 2019-01-07 06:21:33 | Opened | Bayer | 1 | 0 days 00:10:28 | |
| 1 | 2019-01-07 06:32:01 | Closed | Bayer | 1 | | 0 days 00:25:32 |
| 2 | 2019-01-07 06:57:33 | Opened | ITM | 2 | 0 days 00:46:19 | |
| 3 | 2019-01-07 07:24:33 | Closed | ITM | 2 | | 0 days 01:01:52 |
| 4 | 2019-01-07 08:26:25 | Opened | Geco | 3 | 0 days 00:27:00 | |
| 5 | 2019-01-07 09:12:44 | Closed | Geco | 3 | | |
to explain a bit the d2
calc, this (df['Action'] == 'Closed').cumsum()
increments by 1 for each new 'Closed'
row. Here I print it alongside Action
for clarity, using this
df['d2_cond'] = (df['Action'] == 'Closed').cumsum()
df[['Action', 'd2_cond']]
prints
Action d2_cond
0 Opened 0
1 Closed 1
2 Opened 1
3 Closed 2
4 Opened 2
5 Closed 3
so we can groupby
on this list to put together each Closed
with the corresponding next Opened
Upvotes: 3