Reputation: 127
I'm working on dataframe and want to group by a column (ID
), show the respective labels and count for each of them. How do I then get the top 2 Labels for each ID in the dataframe using python?
data=
ID Label
A Apple
B Apple
B Apple
C Banana
C Pear
A Pear
B Apple
C Apple
A Banana
A Apple
C Pear
A Banana
B Pear
B Pear
B Banana
C Apple
I have been able to group by the ID and Label and also get the count for each but i cant get the last part of getting just the top 2 for each ID.
data.groupby(['ID','Label']).size().reset_index(name='counts')
this gives me this table:
ID Label counts
A Apple 2
A Banana 2
A Pear 1
B Apple 3
B Banana 1
B Pear 2
C Apple 2
C Banana 1
C Pear 2
The expected result i want is:
ID Label counts
A Apple 2
Banana 2
B Apple 3
Pear 2
C Apple 2
Pear 2
Upvotes: 3
Views: 319
Reputation: 294488
pd.concat
pd.concat({k: d.Label.value_counts().head(2) for k, d in data.groupby('ID')})
A Apple 2
Banana 2
B Apple 3
Pear 2
C Apple 2
Pear 2
Name: Label, dtype: int64
pd.concat(
{k: d.Label.value_counts().head(2) for k, d in data.groupby('ID')},
names=['ID', 'Label']
).reset_index(name='counts')
ID Label counts
0 A Apple 2
1 A Banana 2
2 B Apple 3
3 B Pear 2
4 C Apple 2
5 C Pear 2
Upvotes: 1
Reputation: 863226
Use SeriesGroupBy.value_counts
which return sorted values, so added GroupBy.head
:
df = (data.groupby('ID')['Label']
.value_counts()
.groupby(level=0)
.head(2)
.reset_index(name='counts'))
print (df)
ID Label counts
0 A Apple 2
1 A Banana 2
2 B Apple 3
3 B Pear 2
4 C Apple 2
5 C Pear 2
Or use custom lambda function:
df = (data.groupby('ID')['Label']
.apply(lambda x: x.value_counts().head(2))
.reset_index(name='counts'))
Upvotes: 4
Reputation: 323326
If you only need the head
two(Top 2)
data.groupby(['ID']).Label.value_counts().groupby(level=0).head(2)
Out[770]:
ID Label
A Apple 2
Banana 2
B Apple 3
Pear 2
C Apple 2
Pear 2
Name: Label, dtype: int64
Upvotes: 6