Reputation: 147
I have a column called 'on' with a series of 0 and 1:
d1 = {'on': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0]}
df = pd.DataFrame(d1)
I want to create a new column called 'value' such that it do a cumulative count cumsum()
only when the '1' of the 'on' column is on and recount from zero once the 'on' column shows zero.
I tried using a combination of cumsum()
and np.where
but I don't get what I want as follows:
df['value_try'] = df['on'].cumsum()
df['value_try'] = np.where(df['on'] == 0, 0, df['value_try'])
Attempt:
on value_try
0 0 0
1 0 0
2 0 0
3 1 1
4 1 2
5 1 3
6 0 0
7 0 0
8 1 4
9 1 5
10 0 0
What my desired output would be:
on value
0 0 0
1 0 0
2 0 0
3 1 1
4 1 2
5 1 3
6 0 0
7 0 0
8 1 1
9 1 2
10 0 0
Upvotes: 3
Views: 749
Reputation: 23237
You can set groups on consecutive 0 or 1 by checking whether the value of on
is equal to that of previous row by .shift()
and get group number by .Series.cumsum()
. Then for each group use .Groupby.cumsum()
to get the value within group.
g = df['on'].ne(df['on'].shift()).cumsum()
df['value'] = df.groupby(g).cumsum()
Result:
print(df)
on value
0 0 0
1 0 0
2 0 0
3 1 1
4 1 2
5 1 3
6 0 0
7 0 0
8 1 1
9 1 2
10 0 0
Upvotes: 2
Reputation: 323396
Let us try cumcount
+ cumsum
df['out'] = df.groupby(df['on'].eq(0).cumsum()).cumcount()
Out[18]:
0 0
1 0
2 0
3 1
4 2
5 3
6 0
7 0
8 1
9 2
10 0
dtype: int64
Upvotes: 1