Vaishali
Vaishali

Reputation: 38415

Difference of event time for each group

Here is a simplified version of huge data I am working on, the data is sorted by time

df = pd.DataFrame({'group':['grp1','grp1','grp1','grp1', 'grp2','grp2','grp2','grp2','grp2'],'event':['foo','bar','foo','bar','foo','bar','bar','foo','bar'], 'time':[10,21,33,54,10,21,56,81,95]})

Qn: I need to find the difference in time (delay) for each pair of foo/bar (bar - foo) for each group. It becomes easy for group1 as foo/bar are equal in number and well aligned (each foo is followed by bar). I would do something like,

final = df.pivot_table(index = 'group', columns = 'event', values = 'time', aggfunc='sum')
final['delay'] = final['bar'] - final['foo']

grp1    32
grp2    81

The above result is good for grp1 but for grp2, since the event bar is repeating at index 5 & 6, I would have to ignore bar value at index 6 and get result,

grp1    32
grp2    25 

Caveat: The events have to be lined up in the exact order as in grp1. If bar appears first, the corresponding time would be ignored and we start from next foo.

Would appreciate suggestions on this

Upvotes: 2

Views: 88

Answers (2)

Ben.T
Ben.T

Reputation: 29635

You can remove rows where row is equal to the shift on any of the 2 columns like:

final = (df[df[['group', 'event']].ne(df[['group', 'event']].shift()).any(1)]
           .pivot_table(index = 'group', columns = 'event', 
                        values = 'time', aggfunc='sum')
        )
final['delay'] = final['bar'] - final['foo']

print (final)
event  bar  foo  delay
group                 
grp1    75   43     32
grp2   116   91     25

for the caveat, I would say you need to add an extra condition in the loc

first = df.loc[0, 'event']
final = (df[df[['group', 'event']].ne(df[['group', 'event']].shift()).any(1)
            &~(df['group'].ne(df['group'].shift())&df['event'].ne(first))] #caveat cond
           .pivot...

Upvotes: 1

anky
anky

Reputation: 75080

You can try with masking through the conditions and then do the pivot (explanations inline in comments):

c1 = df['event'].eq("foo")          #check if current row is foo
c2 = df['event'].shift(-1).eq("bar") #check if next row is bar
cond = c1&c2                          #chaining c1 and c2
#if cond is True , take next row as True since bar is already below foo per cond
final = (df[cond|cond.shift()].pivot_table(index = 'group', 
        columns = 'event', values = 'time', aggfunc='sum').copy())
final['bar']-final['foo']

group
grp1    32
grp2    25
dtype: int64

Upvotes: 2

Related Questions