Reputation: 125
I have a data frame that looks like the following
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6]})
df['value_at_1'] = np.where(df['value'] == 1,1,0)
df
>>>
id value value_at_1
1 1 1
2 2.5 0
3 1.1 0
4 1.4 0
5 1.5 0
6 1 1
7 1.5 0
8 3 0
9 1 1
10 1.6 0
I would like to make a variable that enumerates over the "value_at_1" variable and increments each time until it hits 1, and then restarts. The result would like like this:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],
'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6],
'value_at_1':[1,0,0,0,0,1,0,0,1,0],
'count_since_1':[0,1,2,3,4,0,1,2,0,1]}).set_index(['value_at_1'])
>>>
id value value_at_1 count_since_1
1 1 1 0
2 2.5 0 1
3 1.1 0 2
4 1.4 0 3
5 1.5 0 4
6 1 1 0
7 1.5 0 1
8 3 0 2
9 1 1 0
10 1.6 0 1
Could anyone help me manipulate the data in this way? Thanks!
Upvotes: 1
Views: 275
Reputation: 323386
Just want to provide a new way
import pandas as pd
import numpy as np
import functools
idx=df.index[df['value_at_1'].eq(1)].values.tolist()+[len(df)]
idx=list(np.diff(idx))
df['count_since_1']=functools.reduce(lambda x,y: x+y,[list(range(y)) for y in idx])
df
Out[945]:
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1
Upvotes: 2
Reputation: 294526
Use cumsum
in a groupby
to get cumcount
df.assign(
count_since_1=df.value_at_1.groupby(df.value_at_1.cumsum()).cumcount())
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1
Upvotes: 5