Pandas resetting cumsum() based on a condition of another column

Question

I have a column called 'on' with a series of 0 and 1:

d1 = {'on': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0]}
df = pd.DataFrame(d1)

I want to create a new column called 'value' such that it do a cumulative count cumsum() only when the '1' of the 'on' column is on and recount from zero once the 'on' column shows zero.

I tried using a combination of cumsum() and np.where but I don't get what I want as follows:

df['value_try'] = df['on'].cumsum()
df['value_try'] = np.where(df['on'] == 0, 0, df['value_try'])

Attempt:

    on  value_try
0    0          0
1    0          0
2    0          0
3    1          1
4    1          2
5    1          3
6    0          0
7    0          0
8    1          4
9    1          5
10   0          0

What my desired output would be:

    on  value
0    0      0
1    0      0
2    0      0
3    1      1
4    1      2
5    1      3
6    0      0
7    0      0
8    1      1
9    1      2
10   0      0

SeaBean · Accepted Answer

You can set groups on consecutive 0 or 1 by checking whether the value of on is equal to that of previous row by .shift() and get group number by .Series.cumsum(). Then for each group use .Groupby.cumsum() to get the value within group.

g = df['on'].ne(df['on'].shift()).cumsum()
df['value'] = df.groupby(g).cumsum()

Result:

print(df)

    on  value
0    0      0
1    0      0
2    0      0
3    1      1
4    1      2
5    1      3
6    0      0
7    0      0
8    1      1
9    1      2
10   0      0

Pandas resetting cumsum() based on a condition of another column

Answers (2)

Related Questions