Reputation: 23
I have been thinking in a more efficient way to summarize values in a dataframe between two values.
So, I have this dataframe:
Event Value
==============
Start -
Action 11
Action 6
Action 3
End -
Start -
Action 8
Action 6
End -
So what I want to do is sum all the values between Start and End and store the value in the initial Start
Event Value
==============
Start 20
Action 11
Action 6
Action 3
End -
Start 14
Action 8
Action 6
End -
Is there a way to do this without using loops?
Thanks for your help!
Upvotes: 2
Views: 125
Reputation: 153480
Try this:
df['Value'] = df['Value'].replace('-', np.nan).astype(float)
df['Value'] = (df.groupby(df['Event'].eq('Start').cumsum())['Value']
.transform('sum')
.mask(df['Event'] != 'Start', df['Value'])
.fillna('-'))
Output:
Event Value
1 Start 20
2 Action 11
3 Action 6
4 Action 3
5 End -
6 Start 14
7 Action 8
8 Action 6
9 End -
Details:
groupby
on a cumsum
of events where event equals 'Start', to
create groupstransfrom
to calculate sum and mask
events not equal to
start with the original 'Value'fillna
to get '-' character backUpvotes: 3