John_gis
John_gis

Reputation: 127

Using pandas to perform groupby on a dataframe, sort by the count and get the top 2 counts in python

I'm working on dataframe and want to group by a column (ID), show the respective labels and count for each of them. How do I then get the top 2 Labels for each ID in the dataframe using python?

data= 
ID  Label
A   Apple
B   Apple
B   Apple
C   Banana
C   Pear
A   Pear
B   Apple
C   Apple
A   Banana
A   Apple
C   Pear
A   Banana
B   Pear
B   Pear
B   Banana
C   Apple

I have been able to group by the ID and Label and also get the count for each but i cant get the last part of getting just the top 2 for each ID.

data.groupby(['ID','Label']).size().reset_index(name='counts')

this gives me this table:

ID  Label   counts
A   Apple   2
A   Banana  2
A   Pear    1
B   Apple   3
B   Banana  1
B   Pear    2
C   Apple   2
C   Banana  1
C   Pear    2

The expected result i want is:

ID  Label   counts
A   Apple   2
    Banana  2
B   Apple   3
    Pear    2
C   Apple   2
    Pear    2

Upvotes: 3

Views: 319

Answers (3)

piRSquared
piRSquared

Reputation: 294488

pd.concat

pd.concat({k: d.Label.value_counts().head(2) for k, d in data.groupby('ID')})

A  Apple     2
   Banana    2
B  Apple     3
   Pear      2
C  Apple     2
   Pear      2
Name: Label, dtype: int64

Tucked and Tied

pd.concat(
    {k: d.Label.value_counts().head(2) for k, d in data.groupby('ID')},
    names=['ID', 'Label']
).reset_index(name='counts')

  ID   Label  counts
0  A   Apple       2
1  A  Banana       2
2  B   Apple       3
3  B    Pear       2
4  C   Apple       2
5  C    Pear       2

Upvotes: 1

jezrael
jezrael

Reputation: 863226

Use SeriesGroupBy.value_counts which return sorted values, so added GroupBy.head:

df = (data.groupby('ID')['Label']
          .value_counts()
          .groupby(level=0)
          .head(2)
          .reset_index(name='counts'))
print (df)
  ID   Label  counts
0  A   Apple       2
1  A  Banana       2
2  B   Apple       3
3  B    Pear       2
4  C   Apple       2
5  C    Pear       2

Or use custom lambda function:

df = (data.groupby('ID')['Label']
          .apply(lambda x: x.value_counts().head(2))
          .reset_index(name='counts'))

Upvotes: 4

BENY
BENY

Reputation: 323326

If you only need the head two(Top 2)

data.groupby(['ID']).Label.value_counts().groupby(level=0).head(2)
Out[770]: 
ID  Label 
A   Apple     2
    Banana    2
B   Apple     3
    Pear      2
C   Apple     2
    Pear      2
Name: Label, dtype: int64

Upvotes: 6

Related Questions