Victor
Victor

Reputation: 759

Cumulative sum of a dataframe column with restart

I would like to perform the following function on a dataframe.

Calculate the cumulative sum of a column, notice:

  1. It looks at the previous index only, not including the current one, e.g. the very first one will be zero as there is no previous data to look at.

  2. When it doesn't cumulate, e.g the increment is zero, it restarts the count.

          Number  Cumulative
       0       1           0
       1       1           1
       2       1           2
       3       0           3
       4       0           0
       5       1           0
       6       1           1
       7       0           2
    

I know there is an expanding function, but it doesnt restart when it sees zero

Upvotes: 1

Views: 1545

Answers (1)

sacuL
sacuL

Reputation: 51345

IIUC, this works by making groups according to whether the previous row was 0, then getting the cumulative count:

>>> df
   Number
0       1
1       1
2       1
3       0
4       0
5       1
6       1
7       0

df['Cumulative'] = df.groupby(df.Number.shift().eq(0).cumsum()).cumcount()

>>> df
   Number  Cumulative
0       1           0
1       1           1
2       1           2
3       0           3
4       0           0
5       1           0
6       1           1
7       0           2

Alternatively, if it really is cumsum you want, then apply cumsum with the same grouping as above, and shift it 1 down:

df['Cumulative '] = df.groupby(df.Number.eq(0).cumsum()).cumsum().shift().fillna(0)
>>> df
   Number  Cumulative 
0       1          0.0
1       1          1.0
2       1          2.0
3       0          3.0
4       0          0.0
5       1          0.0
6       1          1.0
7       0          2.0

Upvotes: 1

Related Questions