Create time duration from the data frames in pandas python

Question

I have the below data frame, now I wanted to calculate how long each SOURCE in the Bad status.

For example, SOURCE A in the bad status from 2020-12-16 14:57:56 to 2020-12-16-14:58:01 which is around 5 seconds, and similarly SOURCE A is also in bad status for different time stamps. I wanted to calculate the duration for each source whenever its in Bad status

Below is the data frame code:

import pandas as pd
import datetime

times=[datetime.datetime(2020, 12, 16, 14, 57, 56, 647689),datetime.datetime(2020, 12, 16, 14, 57, 59, 650766),datetime.datetime(2020, 12, 16, 14, 58, 1, 655858),datetime.datetime(2020, 12, 16, 14, 58, 4, 657299),datetime.datetime(2020, 12, 16, 14, 58, 5, 661615),datetime.datetime(2020, 12, 16, 14, 58, 6, 662729),datetime.datetime(2020, 12, 16, 14, 58, 10, 663151),datetime.datetime(2020, 12, 16, 14, 58, 13, 664116),datetime.datetime(2020, 12, 16, 14, 58, 16, 664501),datetime.datetime(2020, 12, 16, 14, 58, 17, 668526)]
source=['A','A','A','A','A','A','A','B','B','B']
status=['Bad','Bad','Good','Bad','Bad','Good','Good','Good','Good','Good']
df=pd.DataFrame({'time':times, 'source':source,'status':status})

Asish M. · Accepted Answer

In [299]: df
Out[299]:
                        time source status  grp
0 2020-12-16 14:57:56.647689      A    Bad    1
1 2020-12-16 14:57:59.650766      A    Bad    1
2 2020-12-16 14:58:01.655858      A   Good    2
3 2020-12-16 14:58:04.657299      A    Bad    3
4 2020-12-16 14:58:05.661615      A    Bad    3
5 2020-12-16 14:58:06.662729      A   Good    4
6 2020-12-16 14:58:10.663151      A   Good    4
7 2020-12-16 14:58:13.664116      B   Good    1
8 2020-12-16 14:58:16.664501      B   Good    1
9 2020-12-16 14:58:17.668526      B   Good    1

In [340]: df['time2'] = df.groupby('source').time.shift(-1)

In [341]: df['grp'] = df.groupby('source').status.transform(lambda x: (x != x.shift()).cumsum())


In [343]: df[df.status == 'Bad'].groupby(['source', 'grp']).agg({'time': min, 'time2': max}).reset_index().drop('grp', axis=1)
Out[343]:
  source                       time                      time2
0      A 2020-12-16 14:57:56.647689 2020-12-16 14:58:01.655858
1      A 2020-12-16 14:58:04.657299 2020-12-16 14:58:06.662729

logic

create a shifted up time column to have the next timestamp (groupby source is required as I'm assuming you wouldn't want sources to be mixed)
the grp column assigns a group number to each status within the source so that consecutive statuses get the same value
this is simple - just need to filter on the "Bad" groups and then groupby the source and grp columns and do the needed aggs

Create time duration from the data frames in pandas python

Answers (1)

Related Questions