Reputation: 13
I have the following dataframe:
df = pd.DataFrame({'col1':['A','A','A','A','A','A','A','B','B','B','B','B','B','B'],'col2':[1,1,1,0,1,1,1,1,0,1,1,0,0,0],'col3':[.5,.5,.3,.2,.1,.4,.7,.4,.5,.3,.2,.4,.5,.6]})
I need a running sum of col3, but the sum should reset whenever col1 changes or if there is a zero in col2. I should return something like in the following: mock output
I tried the following:
df['cum'] = df.groupby([df.col1, df.col2]).col3.cumsum()
However, it is not resetting correctly. It resets but continues summing the the subsequent rows.
Upvotes: 1
Views: 184
Reputation: 59274
Taking a shot here
zeros = df.col2.eq(0)
vals = df.groupby(['col1', df.col2.ne(df.col2.shift()).cumsum()]).col3.cumsum()
df['id'] = np.where(zeros, 0, vals)
Explanation:
Whenever col2
is 0
, just set 0
. Else, just set the cumsum
of the grouped values.
Upvotes: 1