Reputation: 909
I have a dataframe like the following:
col1 col2
0 1 True
1 3 True
2 3 True
3 1 False
4 2 True
5 3 True
6 2 False
7 2 True
I want to get a running sum of True
values. Whenever I see a False
value in col2
, I need to take the cumulative sum of col1
up to that point. So, the DataFrame would look like the following:
col1 col2 col3
0 1 True 0
1 3 True 0
2 3 True 0
3 1 False 7
4 2 True 0
5 3 True 0
6 2 False 5
7 2 True 0
How can I do this?
Upvotes: -1
Views: 235
Reputation: 862691
You can use more general solution which works nice with multiple consecutive False
- then cumulative sum value is not changed:
a = df.groupby((df.col2 != df.col2.shift()).cumsum())['col1'].transform('sum')
df['d'] = a.where(df.col2).ffill().mask(df.col2).fillna(0).astype(int)
print (df)
col1 col2 d
0 1 True 0
1 3 True 0
2 3 True 0
3 1 False 7
4 2 True 0
5 3 True 0
6 2 False 5
7 2 True 0
#added 2 last rows with False in col2
print (df)
col1 col2
0 1 True
1 3 True
2 3 True
3 1 False
4 2 True
5 3 True
6 2 False
7 2 True
8 4 False
9 4 False
a = df.groupby((df.col2 != df.col2.shift()).cumsum())['col1'].transform('sum')
df['d'] = a.where(df.col2).ffill().mask(df.col2).fillna(0).astype(int)
print (df)
col1 col2 d
0 1 True 0
1 3 True 0
2 3 True 0
3 1 False 7
4 2 True 0
5 3 True 0
6 2 False 5
7 2 True 0
8 4 False 2
9 4 False 2
Upvotes: 3
Reputation: 214957
You can create a group variable with cumsum on col2
and then calculate the sum per group:
df.loc[~df.col2, 'col3'] = (df.col1 * df.col2).groupby(by = (~df.col2).cumsum()).cumsum().shift()
df.fillna(0)
Upvotes: 2