owise
owise

Reputation: 1065

Counting Precedant Entries of a column and creating a new varaible of these counts

I have a data frame and I want to count the number of consecutive entries of one column and record the counts in a separate variable. Here is an example:

ID    Class    
1      A        
1      A        
2      A        
1      B        
1      B        
1      B        
2      B        
1      C        
1      C        
2      A        
2      A        
2      A        

I want in each group ID to count the number of consecutive classes, so the output would look like this:

ID    Class   Counts
1      A       0
1      A       1
2      A       0
1      B       0
1      B       1
1      B       2
2      B       0
1      C       0
1      C       1
2      A       0
2      A       1
2      A       2

I am not looking the frequency of occurrence of a specific entries like here, rather the consecutive occurrences of an entry on the ID level

Upvotes: 1

Views: 26

Answers (1)

jezrael
jezrael

Reputation: 863166

You can use cumcount by Series which is create by cumsum of shifted concanecate values by shift:

#use separator which is not in data like _ or ¥
s = df['ID'].astype(str) + '¥' + df['Class']
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
    ID Class  Counts
0    1     A       0
1    1     A       1
2    2     A       0
3    1     B       0
4    1     B       1
5    1     B       2
6    2     B       0
7    1     C       0
8    1     C       1
9    2     A       0
10   2     A       1
11   2     A       2

Another solution with ngroup (pandas 0.20.2+):

s = df.groupby(['ID','Class']).ngroup()
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
    ID Class  Counts
0    1     A       0
1    1     A       1
2    2     A       0
3    1     B       0
4    1     B       1
5    1     B       2
6    2     B       0
7    1     C       0
8    1     C       1
9    2     A       0
10   2     A       1
11   2     A       2

Upvotes: 3

Related Questions