chuky pedro
chuky pedro

Reputation: 745

How to filter a groupby dataframe based on their values count

I have a groupby dataframe and I would like to return the top 3 groups with the highest value count.

for e.g the below dataframe expected output table should be group 20,30 and 33

I wanted to display a raw dataset table, but the group function was not properly displayed on SO, That was I uploaded an image.

                     amount         cosine_group
cosine_group            

0                   952.5              0
4                   3000.0             4    
20                  2000.0            20
                    2000.0            20
                    2000.0            20
27                  2000.0            27    

30                  2100.0            30
                    2100.0            30
                    2100.0            30
33                  1065.0            33
                    1065.0            33
                    1065.0            33
                    1065.0            33

Expected Output:

                     amount         cosine_group
cosine_group            

20                  2000.0            20
                    2000.0            20
                    2000.0            20

30                  2100.0            30
                    2100.0            30
                    2100.0            30
33                  1065.0            33
                    1065.0            33
                    1065.0            33
                    1065.0            33

enter image description here

Upvotes: 1

Views: 476

Answers (2)

SeaBean
SeaBean

Reputation: 23217

You can use .nlargest(3) to select the 3 largest size. Use .isin() to match for those rows with these values. Finally, use .loc to return the rows in original dataframe of the largest elements, as follows:

df = df.rename_axis(index='cosine_group0')   # to rename index axis name
df.loc[df['cosine_group'].isin(df.groupby('cosine_group', as_index=False)['cosine_group'].size().nlargest(3, 'size')['cosine_group'].tolist())]

Or use:

df = df.rename_axis(index='cosine_group0')   # to rename index axis
df.loc[df["cosine_group"].isin(df["cosine_group"].value_counts().nlargest(3).index)]

Upvotes: 3

AmineBTG
AmineBTG

Reputation: 697

This may not be very pythonic but definitely gets the work done.

# retieve the index of the value counts
cosine_group_value = df["cosine_group"].value_counts().index

# get the fist 3 values  from the value counts (highest 3 values)
top3 = list(cosine_group_value)[:3]

# filter your dataframe using the top 3 values on the cosine_group column
df = df[df["cosine_group"].isin(top3)]

Upvotes: 1

Related Questions