manwong0606
manwong0606

Reputation: 147

Pandas resetting cumsum() based on a condition of another column

I have a column called 'on' with a series of 0 and 1:

d1 = {'on': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0]}
df = pd.DataFrame(d1)

I want to create a new column called 'value' such that it do a cumulative count cumsum() only when the '1' of the 'on' column is on and recount from zero once the 'on' column shows zero.

I tried using a combination of cumsum() and np.where but I don't get what I want as follows:

df['value_try'] = df['on'].cumsum()
df['value_try'] = np.where(df['on'] == 0, 0, df['value_try'])

Attempt:

    on  value_try
0    0          0
1    0          0
2    0          0
3    1          1
4    1          2
5    1          3
6    0          0
7    0          0
8    1          4
9    1          5
10   0          0

What my desired output would be:

    on  value
0    0      0
1    0      0
2    0      0
3    1      1
4    1      2
5    1      3
6    0      0
7    0      0
8    1      1
9    1      2
10   0      0

Upvotes: 3

Views: 749

Answers (2)

SeaBean
SeaBean

Reputation: 23237

You can set groups on consecutive 0 or 1 by checking whether the value of on is equal to that of previous row by .shift() and get group number by .Series.cumsum(). Then for each group use .Groupby.cumsum() to get the value within group.

g = df['on'].ne(df['on'].shift()).cumsum()
df['value'] = df.groupby(g).cumsum()

Result:

print(df)

    on  value
0    0      0
1    0      0
2    0      0
3    1      1
4    1      2
5    1      3
6    0      0
7    0      0
8    1      1
9    1      2
10   0      0

Upvotes: 2

BENY
BENY

Reputation: 323396

Let us try cumcount + cumsum

df['out'] = df.groupby(df['on'].eq(0).cumsum()).cumcount()
Out[18]: 
0     0
1     0
2     0
3     1
4     2
5     3
6     0
7     0
8     1
9     2
10    0
dtype: int64

Upvotes: 1

Related Questions