Reputation: 38415
Here is a simplified version of huge data I am working on, the data is sorted by time
df = pd.DataFrame({'group':['grp1','grp1','grp1','grp1', 'grp2','grp2','grp2','grp2','grp2'],'event':['foo','bar','foo','bar','foo','bar','bar','foo','bar'], 'time':[10,21,33,54,10,21,56,81,95]})
Qn: I need to find the difference in time (delay) for each pair of foo/bar (bar - foo) for each group. It becomes easy for group1 as foo/bar are equal in number and well aligned (each foo is followed by bar). I would do something like,
final = df.pivot_table(index = 'group', columns = 'event', values = 'time', aggfunc='sum')
final['delay'] = final['bar'] - final['foo']
grp1 32
grp2 81
The above result is good for grp1 but for grp2, since the event bar is repeating at index 5 & 6, I would have to ignore bar value at index 6 and get result,
grp1 32
grp2 25
Caveat: The events have to be lined up in the exact order as in grp1. If bar appears first, the corresponding time would be ignored and we start from next foo.
Would appreciate suggestions on this
Upvotes: 2
Views: 88
Reputation: 29635
You can remove rows where row is equal to the shift
on any
of the 2 columns like:
final = (df[df[['group', 'event']].ne(df[['group', 'event']].shift()).any(1)]
.pivot_table(index = 'group', columns = 'event',
values = 'time', aggfunc='sum')
)
final['delay'] = final['bar'] - final['foo']
print (final)
event bar foo delay
group
grp1 75 43 32
grp2 116 91 25
for the caveat, I would say you need to add an extra condition in the loc
first = df.loc[0, 'event']
final = (df[df[['group', 'event']].ne(df[['group', 'event']].shift()).any(1)
&~(df['group'].ne(df['group'].shift())&df['event'].ne(first))] #caveat cond
.pivot...
Upvotes: 1
Reputation: 75080
You can try with masking through the conditions and then do the pivot (explanations inline in comments):
c1 = df['event'].eq("foo") #check if current row is foo
c2 = df['event'].shift(-1).eq("bar") #check if next row is bar
cond = c1&c2 #chaining c1 and c2
#if cond is True , take next row as True since bar is already below foo per cond
final = (df[cond|cond.shift()].pivot_table(index = 'group',
columns = 'event', values = 'time', aggfunc='sum').copy())
final['bar']-final['foo']
group
grp1 32
grp2 25
dtype: int64
Upvotes: 2