Pandas cumsum on a separate column condition

Question

I have an issue using Pandas and cumsum which is not behaving as I was expecting so was wondering if anyone could shed some light on how this works.

I have a dataframe that looks as follows:

|     |price    |flag  |cum_sum  |
|-----|---------|------|---------|
|0    |2        |1     |2        |
|1    |5        |1     |7        |
|2    |8        |1     |15       |
|3    |9        |0     |0        |
|4    |12       |0     |0        |
|5    |2        |1     |17       |

Currently the code looks as follows:

df['cum_sum'] = df.groupby(by=['flag','price']).sum().groupby(level=[1]).cumsum()

I only want it to sum a column where a flag is specified. I feel like this should be simple but i'm missing something fundamental. The dataset is huge so was not looking for any loops or iteration answers.

Chris Menezes · Accepted Answer

Looks like you have most of what you want. If you want null values (NaN) for 0-flags then do this:

df['cum_sum'] = df[df['flag'] == 1]['price'].cumsum()

flag price cum_sum 0 1 2 2.0 1 1 5 7.0 2 1 8 15.0 3 0 9 NaN 4 0 12 NaN 5 1 2 17.0

Pandas cumsum on a separate column condition

Answers (2)

Related Questions