GuyGuyGuy
GuyGuyGuy

Reputation: 75

Select top 10 records for each category python

How do I groupby a column and get the top 10 records in each of the categories in that column?

The column I want to groupby has 3 categories 'high', 'med' and 'low'.

I have another column with numeric data that I'm using to rank the data.

Here is the head of my dataframe:

country   designation     points    province               title             year    price   price_category
Italy     Vulkà Bianco     98        Sicily     Nicosia 2013 Vulkà Bianco    2013     65     high

My code here returns the top 2 from the numeric column, but I'm losing all the other columns. Is there a way to do it without losing the other columns?

df.groupby('price_category')['points'].nlargest(2)

Here is my output I've lost all the other columns:

category_column        
high        36528     100
            42197     100
low         5011       95
            15196      95
med         114981     97
            9901       96

I need this ^ but without losing my other columns.

Upvotes: 3

Views: 8741

Answers (3)

rahul-ahuja
rahul-ahuja

Reputation: 1424

Use the following to get the index levels:

df.groupby('price_category')['points'].nlargest(2).index.get_level_values(1)

Then slice the dataframe using the index list

df.iloc[df.groupby('price_category')['points'].nlargest(2).index.get_level_values(1)]

Upvotes: 1

Sociopath
Sociopath

Reputation: 13401

You need:

df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10,11,12], 
                   'level':['low','high','low','medium','medium','high','low','high','medium','high','medium','low'],
                   'values':[23,43,56,12,34,32,18,109,345,21,15,45]})

# use nlargest(10) for your problem. 
print(df.groupby('level')['values'].nlargest(2))

Output:

level                                                                                                                                         
high    7     109                                                                                                                             
        1      43                                                                                                                             
low     2      56                                                                                                                             
        11     45                                                                                                                             
medium  8     345                                                                                                                             
        4      34   

Upvotes: 1

Raj006
Raj006

Reputation: 562

This was asked before and answered here on stack pandas groupby sort within groups. What you have to do is to create a group-by data frame, and create a column with aggregation sum. Now, create a second group-by on the new aggregation column. Then, use .nlargest as mentioned in the post.

Upvotes: 0

Related Questions