Pandas - Summarize values between two rows between two values

Question

I have been thinking in a more efficient way to summarize values in a dataframe between two values.

So, I have this dataframe:

   Event   Value
   ==============
   Start   -
   Action  11
   Action  6
   Action  3
   End     -
   Start   -
   Action  8
   Action  6
   End     -

So what I want to do is sum all the values between Start and End and store the value in the initial Start

   Event    Value
   ==============
   Start   20
   Action  11
   Action  6
   Action  3
   End     -
   Start   14
   Action  8
   Action  6
   End     -

Is there a way to do this without using loops?

Thanks for your help!

Scott Boston · Accepted Answer

Try this:

df['Value'] = df['Value'].replace('-', np.nan).astype(float)

df['Value'] = (df.groupby(df['Event'].eq('Start').cumsum())['Value']
                 .transform('sum')
                 .mask(df['Event'] != 'Start', df['Value'])
                 .fillna('-'))

Output:

    Event Value
1   Start    20
2  Action    11
3  Action     6
4  Action     3
5     End     -
6   Start    14
7  Action     8
8  Action     6
9     End     -

Details:

Use groupby on a cumsum of events where event equals 'Start', to create groups
Then use transfrom to calculate sum and mask events not equal to start with the original 'Value'
And then use fillna to get '-' character back

Pandas - Summarize values between two rows between two values

Answers (1)

Related Questions