Reputation: 1529
I have an issue using Pandas and cumsum which is not behaving as I was expecting so was wondering if anyone could shed some light on how this works.
I have a dataframe that looks as follows:
| |price |flag |cum_sum |
|-----|---------|------|---------|
|0 |2 |1 |2 |
|1 |5 |1 |7 |
|2 |8 |1 |15 |
|3 |9 |0 |0 |
|4 |12 |0 |0 |
|5 |2 |1 |17 |
Currently the code looks as follows:
df['cum_sum'] = df.groupby(by=['flag','price']).sum().groupby(level=[1]).cumsum()
I only want it to sum a column where a flag is specified. I feel like this should be simple but i'm missing something fundamental. The dataset is huge so was not looking for any loops or iteration answers.
Upvotes: 0
Views: 1992
Reputation: 71
Looks like you have most of what you want. If you want null values (NaN) for 0-flags then do this:
df['cum_sum'] = df[df['flag'] == 1]['price'].cumsum()
flag price cum_sum
0 1 2 2.0
1 1 5 7.0
2 1 8 15.0
3 0 9 NaN
4 0 12 NaN
5 1 2 17.0
Upvotes: 1
Reputation: 210832
Is this what you want?
In [15]: df.price.mul(df.flag).cumsum().mul(df.flag)
Out[15]:
0 2
1 7
2 15
3 0
4 0
5 17
dtype: int64
Upvotes: 0