Difference of event time for each group

Question

Here is a simplified version of huge data I am working on, the data is sorted by time

df = pd.DataFrame({'group':['grp1','grp1','grp1','grp1', 'grp2','grp2','grp2','grp2','grp2'],'event':['foo','bar','foo','bar','foo','bar','bar','foo','bar'], 'time':[10,21,33,54,10,21,56,81,95]})

Qn: I need to find the difference in time (delay) for each pair of foo/bar (bar - foo) for each group. It becomes easy for group1 as foo/bar are equal in number and well aligned (each foo is followed by bar). I would do something like,

final = df.pivot_table(index = 'group', columns = 'event', values = 'time', aggfunc='sum')
final['delay'] = final['bar'] - final['foo']

grp1    32
grp2    81

The above result is good for grp1 but for grp2, since the event bar is repeating at index 5 & 6, I would have to ignore bar value at index 6 and get result,

grp1    32
grp2    25

Caveat: The events have to be lined up in the exact order as in grp1. If bar appears first, the corresponding time would be ignored and we start from next foo.

Would appreciate suggestions on this

anky · Accepted Answer

You can try with masking through the conditions and then do the pivot (explanations inline in comments):

c1 = df['event'].eq("foo")          #check if current row is foo
c2 = df['event'].shift(-1).eq("bar") #check if next row is bar
cond = c1&c2                          #chaining c1 and c2
#if cond is True , take next row as True since bar is already below foo per cond
final = (df[cond|cond.shift()].pivot_table(index = 'group', 
        columns = 'event', values = 'time', aggfunc='sum').copy())
final['bar']-final['foo']

group
grp1    32
grp2    25
dtype: int64

Difference of event time for each group

Answers (2)

Related Questions