Reputation: 2208
I have dataframe data grouped by two columns (X, Y) and then I have count of elements in Z. Idea here is to find the top 2 counts of elements across X, Y.
Dataframe should look like:
mostCountYInX = df.groupby(['X','Y'],as_index=False).count()
C X Y Z
USA NY NY 5
USA NY BR 14
USA NJ JC 40
USA FL MI 3
IND MAH MUM 4
IND KAR BLR 2
IND KER TVM 2
CHN HK HK 3
CHN SH SH 3
Individually, I can extract the information I am looking for:
XTopCountInTopY = mostCountYInX[mostCountYInX['X'] == 'NY']
XTopCountInTopY = XTopCountInTopY.nlargest(2,'Y')
In the above I knew group I am looking for which is X = NY and got the top 2 records. Is there a way to print them together?
Say I am interested in IND and USA then the Output expected:
C X Y Z
USA NJ JC 40
USA NY BR 14
IND MAH MUM 4
IND KAR BLR 2
Upvotes: 0
Views: 658
Reputation: 13255
I think you need groupby
on index with parameter sort=False
then apply
using lambda function
and sort_values
on Z
using parameter ascending=False
then take top 2 values and reset_index
as:
mask = df.index.isin(['USA','IND'])
df = df[mask].groupby(df[mask].index,sort=False).\
apply(lambda x: x.sort_values('Z',ascending=False)[:2]).\
reset_index(level=0,drop=True)
print(df)
X Y Z
USA NJ JC 40
USA NY BR 14
IND MAH MUM 4
IND KAR BLR 2
EDIT : After OP changed the Dataframe:
mask = df['C'].isin(['USA','IND'])
df = df[mask].groupby('C',sort=False).\
apply(lambda x: x.sort_values('Z',ascending=False)[:2]).\
reset_index(drop=True)
print(df)
C X Y Z
0 USA NJ JC 40
1 USA NY BR 14
2 IND MAH MUM 4
3 IND KAR BLR 2
Upvotes: 1