pandas - top count items after groupby on multiple columns

Question

I have dataframe data grouped by two columns (X, Y) and then I have count of elements in Z. Idea here is to find the top 2 counts of elements across X, Y.

Dataframe should look like:

mostCountYInX = df.groupby(['X','Y'],as_index=False).count()
 C      X   Y     Z
USA    NY  NY    5
USA    NY  BR    14
USA    NJ  JC    40
USA    FL  MI    3
IND    MAH MUM   4
IND    KAR BLR   2
IND    KER TVM   2
CHN    HK  HK    3
CHN    SH  SH    3

Individually, I can extract the information I am looking for:

XTopCountInTopY = mostCountYInX[mostCountYInX['X'] == 'NY']
XTopCountInTopY = XTopCountInTopY.nlargest(2,'Y')

In the above I knew group I am looking for which is X = NY and got the top 2 records. Is there a way to print them together?

Say I am interested in IND and USA then the Output expected:

 C     X   Y     Z
USA    NJ  JC    40
USA    NY  BR    14
IND    MAH MUM   4
IND    KAR BLR   2

Space Impact · Accepted Answer

I think you need groupby on index with parameter sort=False then apply using lambda function and sort_values on Z using parameter ascending=False then take top 2 values and reset_index as:

mask = df.index.isin(['USA','IND'])
df = df[mask].groupby(df[mask].index,sort=False).\
              apply(lambda x: x.sort_values('Z',ascending=False)[:2]).\
              reset_index(level=0,drop=True)
print(df)

       X    Y   Z
USA   NJ   JC  40
USA   NY   BR  14
IND  MAH  MUM   4
IND  KAR  BLR   2

EDIT : After OP changed the Dataframe:

mask = df['C'].isin(['USA','IND'])
df = df[mask].groupby('C',sort=False).\
              apply(lambda x: x.sort_values('Z',ascending=False)[:2]).\
              reset_index(drop=True)

print(df)
     C    X    Y   Z
0  USA   NJ   JC  40
1  USA   NY   BR  14
2  IND  MAH  MUM   4
3  IND  KAR  BLR   2

pandas - top count items after groupby on multiple columns

Answers (1)

Related Questions