Reputation: 3217
I have the below data frame, now I wanted to calculate how long each SOURCE in the Bad status.
For example, SOURCE A in the bad status from 2020-12-16 14:57:56 to 2020-12-16-14:58:01 which is around 5 seconds, and similarly SOURCE A is also in bad status for different time stamps. I wanted to calculate the duration for each source whenever its in Bad status
Below is the data frame code:
import pandas as pd
import datetime
times=[datetime.datetime(2020, 12, 16, 14, 57, 56, 647689),datetime.datetime(2020, 12, 16, 14, 57, 59, 650766),datetime.datetime(2020, 12, 16, 14, 58, 1, 655858),datetime.datetime(2020, 12, 16, 14, 58, 4, 657299),datetime.datetime(2020, 12, 16, 14, 58, 5, 661615),datetime.datetime(2020, 12, 16, 14, 58, 6, 662729),datetime.datetime(2020, 12, 16, 14, 58, 10, 663151),datetime.datetime(2020, 12, 16, 14, 58, 13, 664116),datetime.datetime(2020, 12, 16, 14, 58, 16, 664501),datetime.datetime(2020, 12, 16, 14, 58, 17, 668526)]
source=['A','A','A','A','A','A','A','B','B','B']
status=['Bad','Bad','Good','Bad','Bad','Good','Good','Good','Good','Good']
df=pd.DataFrame({'time':times, 'source':source,'status':status})
Upvotes: 1
Views: 123
Reputation: 2647
In [299]: df
Out[299]:
time source status grp
0 2020-12-16 14:57:56.647689 A Bad 1
1 2020-12-16 14:57:59.650766 A Bad 1
2 2020-12-16 14:58:01.655858 A Good 2
3 2020-12-16 14:58:04.657299 A Bad 3
4 2020-12-16 14:58:05.661615 A Bad 3
5 2020-12-16 14:58:06.662729 A Good 4
6 2020-12-16 14:58:10.663151 A Good 4
7 2020-12-16 14:58:13.664116 B Good 1
8 2020-12-16 14:58:16.664501 B Good 1
9 2020-12-16 14:58:17.668526 B Good 1
In [340]: df['time2'] = df.groupby('source').time.shift(-1)
In [341]: df['grp'] = df.groupby('source').status.transform(lambda x: (x != x.shift()).cumsum())
In [343]: df[df.status == 'Bad'].groupby(['source', 'grp']).agg({'time': min, 'time2': max}).reset_index().drop('grp', axis=1)
Out[343]:
source time time2
0 A 2020-12-16 14:57:56.647689 2020-12-16 14:58:01.655858
1 A 2020-12-16 14:58:04.657299 2020-12-16 14:58:06.662729
logic
grp
column assigns a group number to each status within the source so that consecutive statuses get the same valueUpvotes: 1