Taking Differences of Records When Status Changes - Pandas

Question

I have customer records with id, timestamp and status.

ID, TS, STATUS
1 10 GOOD
1 20 GOOD
1 25 BAD
1 30 BAD
1 50 BAD
1 600 GOOD
2 40 GOOD
.. ...

I am trying to calculate how much time is spent in consecutive BAD statuses (lets imagine order above is correct) per customer. So for customer id=1, 30-25,50-30,600-50 in total 575 seconds was spent in BAD status.

What is the method of doing this in Pandas? If I calculate .diff() on TS, that would give me differences, but how can I tie that 1) to the customer 2) certain status "blocks" for that customer?

Sample data:

df = pandas.DataFrame({'ID':[1,1,1,1,1,1,2],
                       'TS':[10,20,25,30,50,600,40],
                       'Status':['G','G','B','B','B','G','G']
                       },
                      columns=['ID','TS','Status'])

Thanks,

Zelazny7 · Accepted Answer

In [1]: df = DataFrame({'ID':[1,1,1,1,1,2,2],'TS':[10,20,25,30,50,10,40],'Stat
us':['G','G','B','B','B','B','B']}, columns=['ID','TS','Status'])

In [2]: f = lambda x: x.diff().sum()

In [3]: df['diff'] = df[df.Status=='B'].groupby('ID')['TS'].transform(f)

In [4]: df
Out[4]:
   ID  TS Status  diff
0   1  10      G   NaN
1   1  20      G   NaN
2   1  25      B    25
3   1  30      B    25
4   1  50      B    25
5   2  10      B    30
6   2  40      B    30

Explanation: Subset the dataframe to only those records with the desired Status. Groupby the ID and apply the lambda function diff().sum() to each group. Use transform instead of apply because transform returns an indexed series which you can use to assign to a new column 'diff'.

EDIT: New response to account for expanded question scope.

In [1]: df
Out[1]:
   ID   TS Status
0   1   10      G
1   1   20      G
2   1   25      B
3   1   30      B
4   1   50      B
5   1  600      G
6   2   40      G

In [2]: df['shift'] = -df['TS'].diff(-1)

In [3]: df['diff'] = df[df.Status=='B'].groupby('ID')['shift'].transform('sum')
In [4]: df
Out[4]:
   ID   TS Status  shift  diff
0   1   10      G     10   NaN
1   1   20      G      5   NaN
2   1   25      B      5   575
3   1   30      B     20   575
4   1   50      B    550   575
5   1  600      G   -560   NaN
6   2   40      G    NaN   NaN

Taking Differences of Records When Status Changes - Pandas

Answers (2)

Related Questions