Reputation: 745
I have a groupby dataframe and I would like to return the top 3 groups with the highest value count.
for e.g the below dataframe expected output table should be group
20,30 and 33
I wanted to display a raw dataset table, but the group function was not properly displayed on SO, That was I uploaded an image.
amount cosine_group
cosine_group
0 952.5 0
4 3000.0 4
20 2000.0 20
2000.0 20
2000.0 20
27 2000.0 27
30 2100.0 30
2100.0 30
2100.0 30
33 1065.0 33
1065.0 33
1065.0 33
1065.0 33
Expected Output:
amount cosine_group
cosine_group
20 2000.0 20
2000.0 20
2000.0 20
30 2100.0 30
2100.0 30
2100.0 30
33 1065.0 33
1065.0 33
1065.0 33
1065.0 33
Upvotes: 1
Views: 476
Reputation: 23217
You can use .nlargest(3)
to select the 3 largest size. Use .isin()
to match for those rows with these values. Finally, use .loc
to return the rows in original dataframe of the largest elements, as follows:
df = df.rename_axis(index='cosine_group0') # to rename index axis name
df.loc[df['cosine_group'].isin(df.groupby('cosine_group', as_index=False)['cosine_group'].size().nlargest(3, 'size')['cosine_group'].tolist())]
Or use:
df = df.rename_axis(index='cosine_group0') # to rename index axis
df.loc[df["cosine_group"].isin(df["cosine_group"].value_counts().nlargest(3).index)]
Upvotes: 3
Reputation: 697
This may not be very pythonic but definitely gets the work done.
# retieve the index of the value counts
cosine_group_value = df["cosine_group"].value_counts().index
# get the fist 3 values from the value counts (highest 3 values)
top3 = list(cosine_group_value)[:3]
# filter your dataframe using the top 3 values on the cosine_group column
df = df[df["cosine_group"].isin(top3)]
Upvotes: 1