Pandas DataFrame rolling count

Question

I have the following pandas dataframe (just an example):

import pandas as pd
df = pd.DataFrame(pd.Series(['a','a','a','b','b','c','c','c','c','b','c','a']), columns = ['Data'])

The goal is to get another column, Stats, that count the element of Data column as following:

   Data Stats
0     a      
1     a      
2     a    a3
3     b      
4     b    b2
5     c      
6     c      
7     c      
8     c    c4
9     b    b1
10    c    c1
11    a    a1

Where, for example, a3 means "three consecutive a elements", c4 means "four consecutive c elements" and so on...

Thank you in advance for your help

jpp · Accepted Answer

Here's one way using groupby:

counts = df.groupby((df['Data'] != df['Data'].shift()).cumsum()).cumcount() + 1

df['Stats'] = np.where(df['Data'] != df['Data'].shift(-1),
                       df['Data'] + counts.astype(str), '')

print(df)

   Data Stats
0     a      
1     a      
2     a    a3
3     b      
4     b    b2
5     c      
6     c      
7     c      
8     c    c4
9     b    b1
10    c    c1
11    a    a1

Pandas DataFrame rolling count

Answers (2)

Related Questions