pandachii
pandachii

Reputation: 13

How to cumsum streak with reset in pandas

I have the following dataframe:

df = pd.DataFrame({'col1':['A','A','A','A','A','A','A','B','B','B','B','B','B','B'],'col2':[1,1,1,0,1,1,1,1,0,1,1,0,0,0],'col3':[.5,.5,.3,.2,.1,.4,.7,.4,.5,.3,.2,.4,.5,.6]})

I need a running sum of col3, but the sum should reset whenever col1 changes or if there is a zero in col2. I should return something like in the following: mock output

I tried the following:

df['cum'] = df.groupby([df.col1, df.col2]).col3.cumsum()

However, it is not resetting correctly. It resets but continues summing the the subsequent rows.

Upvotes: 1

Views: 184

Answers (1)

rafaelc
rafaelc

Reputation: 59274

Taking a shot here

zeros = df.col2.eq(0)
vals = df.groupby(['col1', df.col2.ne(df.col2.shift()).cumsum()]).col3.cumsum()

df['id'] = np.where(zeros, 0, vals)

Explanation:

Whenever col2 is 0, just set 0. Else, just set the cumsum of the grouped values.

Upvotes: 1

Related Questions