Benjamin Simon
Benjamin Simon

Reputation: 23

Pandas - Summarize values between two rows between two values

I have been thinking in a more efficient way to summarize values in a dataframe between two values.

So, I have this dataframe:

   Event   Value
   ==============
   Start   -
   Action  11
   Action  6
   Action  3
   End     -
   Start   -
   Action  8
   Action  6
   End     -

So what I want to do is sum all the values between Start and End and store the value in the initial Start

   Event    Value
   ==============
   Start   20
   Action  11
   Action  6
   Action  3
   End     -
   Start   14
   Action  8
   Action  6
   End     -

Is there a way to do this without using loops?

Thanks for your help!

Upvotes: 2

Views: 125

Answers (1)

Scott Boston
Scott Boston

Reputation: 153480

Try this:

df['Value'] = df['Value'].replace('-', np.nan).astype(float)

df['Value'] = (df.groupby(df['Event'].eq('Start').cumsum())['Value']
                 .transform('sum')
                 .mask(df['Event'] != 'Start', df['Value'])
                 .fillna('-'))

Output:

    Event Value
1   Start    20
2  Action    11
3  Action     6
4  Action     3
5     End     -
6   Start    14
7  Action     8
8  Action     6
9     End     -

Details:

  • Use groupby on a cumsum of events where event equals 'Start', to create groups
  • Then use transfrom to calculate sum and mask events not equal to start with the original 'Value'
  • And then use fillna to get '-' character back

Upvotes: 3

Related Questions