Cyril YU
Cyril YU

Reputation: 51

Pandas: counting consecutive rows with condition

I have a table like this:

name = ['a','a','a','a','a','b','b','b','b']
fillrate = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.3, 0.3, 0.4]
df = pd.DataFrame(name)
df.columns = ['name']
df['fillrate'] = fillrate

I want to create a column like this:

df['count'] = [1,2,1,2,3,1,2,3,1]

Explanation: the 'count' column resets to 1 when there's a new name, OR when fill rate increases; otherwise, 'count' column equals to the last value plus 1.

It's easy to do it using loops, but I'd like to avoid this since the data is huge. Is there an alternative way to do it?

Upvotes: 1

Views: 1005

Answers (1)

BENY
BENY

Reputation: 323236

IIUC let us combine shift with diff and using cumsum create the sub-group, cumcount

s=(df.name.ne(df.name.shift()) | df.fillrate.diff().gt(0)).cumsum()
s.groupby(s).cumcount()+1
Out[17]: 
0    1
1    2
2    1
3    2
4    3
5    1
6    2
7    3
8    1
dtype: int64

Upvotes: 5

Related Questions