Reputation: 51
I have a table like this:
name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate
I want to create a column like this:
df['count'] = [1,2,1,2,3,1,2,3,1]
Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases; otherwise, 'count' column equals to the last value plus 1.
It's easy to do it using loops, but I'd like to avoid this since the data is huge. Is there an alternative way to do it?
Upvotes: 1
Views: 1005
Reputation: 323236
IIUC let us combine shift
with diff
and using cumsum
create the sub-group, cumcount
s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]:
0 1
1 2
2 1
3 2
4 3
5 1
6 2
7 3
8 1
dtype: int64
Upvotes: 5