Reputation: 759
I would like to perform the following function on a dataframe.
Calculate the cumulative sum of a column, notice:
It looks at the previous index only, not including the current one, e.g. the very first one will be zero as there is no previous data to look at.
When it doesn't cumulate, e.g the increment is zero, it restarts the count.
Number Cumulative
0 1 0
1 1 1
2 1 2
3 0 3
4 0 0
5 1 0
6 1 1
7 0 2
I know there is an expanding function, but it doesnt restart when it sees zero
Upvotes: 1
Views: 1545
Reputation: 51345
IIUC, this works by making groups according to whether the previous row was 0, then getting the cumulative count:
>>> df
Number
0 1
1 1
2 1
3 0
4 0
5 1
6 1
7 0
df['Cumulative'] = df.groupby(df.Number.shift().eq(0).cumsum()).cumcount()
>>> df
Number Cumulative
0 1 0
1 1 1
2 1 2
3 0 3
4 0 0
5 1 0
6 1 1
7 0 2
Alternatively, if it really is cumsum
you want, then apply cumsum with the same grouping as above, and shift it 1 down:
df['Cumulative '] = df.groupby(df.Number.eq(0).cumsum()).cumsum().shift().fillna(0)
>>> df
Number Cumulative
0 1 0.0
1 1 1.0
2 1 2.0
3 0 3.0
4 0 0.0
5 1 0.0
6 1 1.0
7 0 2.0
Upvotes: 1