Dfeld
Dfeld

Reputation: 197

Running count based on multiple columns

I would like to create a running count of the values in the "Button" column (Start, Stop) within each "ID". However, any change in the "Button" value or a change of "ID" should reset the running count. The data frame is below:

data = pd.DataFrame({
'ID': ['A','A','B','B','C','C','C','C','C','D','E','E'], 
'Button': ['Start','Stop','Start','Stop','Start','Start','Stop','Start','Stop','Start','Start','Stop']
})

I can create a running count based on the "Button" values, but can't figure out how to group that by "ID".

data['runningCount'] = data.groupby(data['Button']).cumcount()+1

I'm looking for the following result:

result = pd.DataFrame({
'ID': ['A','A','B','B','C','C','C','C','C','D','E','E'], 
'Button': ['Start','Stop','Start','Stop','Start','Start','Stop','Start','Stop','Start','Start','Stop'],
'Count': [1,1,1,1,1,2,1,1,1,1,1,1]})

Upvotes: 3

Views: 813

Answers (1)

BENY
BENY

Reputation: 323316

You just need to create a specific key with shift

s=data.groupby('ID').Button.apply(lambda x : (x!=x.shift()).cumsum())
data.groupby([data.ID,s]).cumcount()+1
Out[189]: 
0     1
1     1
2     1
3     1
4     1
5     2
6     1
7     1
8     1
9     1
10    1
11    1
dtype: int64

More info

data.groupby('ID').Button.apply(lambda x : (x!=x.shift()).cumsum())
Out[192]: 
0     1
1     2
2     1
3     2
4     1
5     1
6     2
7     3
8     4
9     1
10    1
11    2
Name: Button, dtype: int32

Upvotes: 3

Related Questions