Ravi
Ravi

Reputation: 3217

Create time duration from the data frames in pandas python

I have the below data frame, now I wanted to calculate how long each SOURCE in the Bad status.

enter image description here

For example, SOURCE A in the bad status from 2020-12-16 14:57:56 to 2020-12-16-14:58:01 which is around 5 seconds, and similarly SOURCE A is also in bad status for different time stamps. I wanted to calculate the duration for each source whenever its in Bad status

enter image description here

Below is the data frame code:

import pandas as pd
import datetime

times=[datetime.datetime(2020, 12, 16, 14, 57, 56, 647689),datetime.datetime(2020, 12, 16, 14, 57, 59, 650766),datetime.datetime(2020, 12, 16, 14, 58, 1, 655858),datetime.datetime(2020, 12, 16, 14, 58, 4, 657299),datetime.datetime(2020, 12, 16, 14, 58, 5, 661615),datetime.datetime(2020, 12, 16, 14, 58, 6, 662729),datetime.datetime(2020, 12, 16, 14, 58, 10, 663151),datetime.datetime(2020, 12, 16, 14, 58, 13, 664116),datetime.datetime(2020, 12, 16, 14, 58, 16, 664501),datetime.datetime(2020, 12, 16, 14, 58, 17, 668526)]
source=['A','A','A','A','A','A','A','B','B','B']
status=['Bad','Bad','Good','Bad','Bad','Good','Good','Good','Good','Good']
df=pd.DataFrame({'time':times, 'source':source,'status':status})

Upvotes: 1

Views: 123

Answers (1)

Asish M.
Asish M.

Reputation: 2647

In [299]: df
Out[299]:
                        time source status  grp
0 2020-12-16 14:57:56.647689      A    Bad    1
1 2020-12-16 14:57:59.650766      A    Bad    1
2 2020-12-16 14:58:01.655858      A   Good    2
3 2020-12-16 14:58:04.657299      A    Bad    3
4 2020-12-16 14:58:05.661615      A    Bad    3
5 2020-12-16 14:58:06.662729      A   Good    4
6 2020-12-16 14:58:10.663151      A   Good    4
7 2020-12-16 14:58:13.664116      B   Good    1
8 2020-12-16 14:58:16.664501      B   Good    1
9 2020-12-16 14:58:17.668526      B   Good    1

In [340]: df['time2'] = df.groupby('source').time.shift(-1)

In [341]: df['grp'] = df.groupby('source').status.transform(lambda x: (x != x.shift()).cumsum())


In [343]: df[df.status == 'Bad'].groupby(['source', 'grp']).agg({'time': min, 'time2': max}).reset_index().drop('grp', axis=1)
Out[343]:
  source                       time                      time2
0      A 2020-12-16 14:57:56.647689 2020-12-16 14:58:01.655858
1      A 2020-12-16 14:58:04.657299 2020-12-16 14:58:06.662729

logic

  1. create a shifted up time column to have the next timestamp (groupby source is required as I'm assuming you wouldn't want sources to be mixed)
  2. the grp column assigns a group number to each status within the source so that consecutive statuses get the same value
  3. this is simple - just need to filter on the "Bad" groups and then groupby the source and grp columns and do the needed aggs

Upvotes: 1

Related Questions