robmsmt
robmsmt

Reputation: 1529

Pandas cumsum on a separate column condition

I have an issue using Pandas and cumsum which is not behaving as I was expecting so was wondering if anyone could shed some light on how this works.

I have a dataframe that looks as follows:

|     |price    |flag  |cum_sum  |
|-----|---------|------|---------|
|0    |2        |1     |2        |
|1    |5        |1     |7        |
|2    |8        |1     |15       |
|3    |9        |0     |0        |
|4    |12       |0     |0        |
|5    |2        |1     |17       |

Currently the code looks as follows:

df['cum_sum'] = df.groupby(by=['flag','price']).sum().groupby(level=[1]).cumsum()

I only want it to sum a column where a flag is specified. I feel like this should be simple but i'm missing something fundamental. The dataset is huge so was not looking for any loops or iteration answers.

Upvotes: 0

Views: 1992

Answers (2)

Chris Menezes
Chris Menezes

Reputation: 71

Looks like you have most of what you want. If you want null values (NaN) for 0-flags then do this:

df['cum_sum'] = df[df['flag'] == 1]['price'].cumsum()

flag price cum_sum 0 1 2 2.0 1 1 5 7.0 2 1 8 15.0 3 0 9 NaN 4 0 12 NaN 5 1 2 17.0

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210832

Is this what you want?

In [15]: df.price.mul(df.flag).cumsum().mul(df.flag)
Out[15]:
0     2
1     7
2    15
3     0
4     0
5    17
dtype: int64

Upvotes: 0

Related Questions