Reputation: 1065
I have a data frame and I want to count the number of consecutive entries of one column and record the counts in a separate variable. Here is an example:
ID Class
1 A
1 A
2 A
1 B
1 B
1 B
2 B
1 C
1 C
2 A
2 A
2 A
I want in each group ID to count the number of consecutive classes, so the output would look like this:
ID Class Counts
1 A 0
1 A 1
2 A 0
1 B 0
1 B 1
1 B 2
2 B 0
1 C 0
1 C 1
2 A 0
2 A 1
2 A 2
I am not looking the frequency of occurrence of a specific entries like here, rather the consecutive occurrences of an entry on the ID level
Upvotes: 1
Views: 26
Reputation: 863166
You can use cumcount
by Series
which is create by cumsum
of shifted concanecate values by shift
:
#use separator which is not in data like _ or ¥
s = df['ID'].astype(str) + '¥' + df['Class']
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
ID Class Counts
0 1 A 0
1 1 A 1
2 2 A 0
3 1 B 0
4 1 B 1
5 1 B 2
6 2 B 0
7 1 C 0
8 1 C 1
9 2 A 0
10 2 A 1
11 2 A 2
Another solution with ngroup
(pandas 0.20.2+
):
s = df.groupby(['ID','Class']).ngroup()
df['Counts'] = df.groupby(s.ne(s.shift()).cumsum()).cumcount()
print (df)
ID Class Counts
0 1 A 0
1 1 A 1
2 2 A 0
3 1 B 0
4 1 B 1
5 1 B 2
6 2 B 0
7 1 C 0
8 1 C 1
9 2 A 0
10 2 A 1
11 2 A 2
Upvotes: 3