Mike Marotta
Mike Marotta

Reputation: 137

pandas - determine the duration of an event

I have a Panadas DataFrame containing the measurements (Val) of various operations (OpID) with the time-stamp for when the measurement occurred (OpTime).

OpID              OpTime  Val
 143 2014-01-01 02:35:02   20
 143 2014-01-01 02:40:01   24
 143 2014-01-01 02:40:03   0
 143 2014-01-01 02:45:01   0
 143 2014-01-01 02:50:01   20
 143 2014-01-01 02:55:01   0
 143 2014-01-01 03:00:01   20
 143 2014-01-01 03:05:01   24
 143 2014-01-01 03:10:01   20
 212 2014-01-01 02:15:01   20
 212 2014-01-01 02:17:02   0
 212 2014-01-01 02:20:01   0
 212 2014-01-01 02:25:01   0
 212 2014-01-01 02:30:01   20
 299 2014-01-01 03:30:03   33
 299 2014-01-01 03:35:02   33
 299 2014-01-01 03:40:01   34
 299 2014-01-01 03:45:01   33
 299 2014-01-01 03:45:02   34

My desire to to generate an output which only shows the time periods in which the measurement returned zero. The Start column would show the first in a series of OpTimes which resulted in zero and End would contain the OpTime of the first Val that was non-zero.

Given the above sample data my desired result is:

OpID                      Start                     End
 143        2014-01-01 02:40:03     2014-01-01 02:50:01
 143        2014-01-01 02:55:01     2014-01-01 03:00:01
 212        2014-01-01 02:17:02     2014-01-01 02:30:01

Upvotes: 0

Views: 511

Answers (1)

CT Zhu
CT Zhu

Reputation: 54330

How about using pivot()?

import numpy as np
import pandas as pd

df['Zeros'] = (df['Val']==0)
df['Valdf'] = np.hstack((nan, diff(df['Zeros'].values))) #how do you treat the first cell?
df['Valdr'] = np.hstack((diff(df['Zeros'].values), nan)) #how do you treat the last cell?
pf2 = pd.concat([df[((df['Zeros']!=True)&(df['Valdf']==1))],df[((df['Zeros'])&(df['Valdr']==1))]]).sort_index()
pf2['State'] = np.where(pf2['Zeros'], 'Start', 'End')
pf2['idx2'] = pf2['OpID'] + np.arange(len(pf2))/2*0.00000001 #need unique index for .pivot()
print pf2.pivot(index='idx2', columns='State', values='OpTime')

State                  End                Start
idx2                                           
143.0  2014-01-01 02:50:01  2014-01-01 02:45:01
143.1  2014-01-01 03:00:01  2014-01-01 02:55:01
212.2  2014-01-01 02:30:01  2014-01-01 02:25:01

To get the one you presented, probably just:

pf3 = pd.DataFrame(pf2.pivot(index='idx2', columns='State', values='OpTime'))
pf3.index=asarray(pf3.index.values, 'int')

Upvotes: 1

Related Questions