Reputation: 75
How do I groupby a column and get the top 10 records in each of the categories in that column?
The column I want to groupby has 3 categories 'high', 'med' and 'low'.
I have another column with numeric data that I'm using to rank the data.
Here is the head of my dataframe:
country designation points province title year price price_category
Italy Vulkà Bianco 98 Sicily Nicosia 2013 Vulkà Bianco 2013 65 high
My code here returns the top 2 from the numeric column, but I'm losing all the other columns. Is there a way to do it without losing the other columns?
df.groupby('price_category')['points'].nlargest(2)
Here is my output I've lost all the other columns:
category_column
high 36528 100
42197 100
low 5011 95
15196 95
med 114981 97
9901 96
I need this ^ but without losing my other columns.
Upvotes: 3
Views: 8741
Reputation: 1424
Use the following to get the index levels:
df.groupby('price_category')['points'].nlargest(2).index.get_level_values(1)
Then slice the dataframe using the index list
df.iloc[df.groupby('price_category')['points'].nlargest(2).index.get_level_values(1)]
Upvotes: 1
Reputation: 13401
You need:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10,11,12],
'level':['low','high','low','medium','medium','high','low','high','medium','high','medium','low'],
'values':[23,43,56,12,34,32,18,109,345,21,15,45]})
# use nlargest(10) for your problem.
print(df.groupby('level')['values'].nlargest(2))
Output:
level
high 7 109
1 43
low 2 56
11 45
medium 8 345
4 34
Upvotes: 1
Reputation: 562
This was asked before and answered here on stack pandas groupby sort within groups. What you have to do is to create a group-by data frame, and create a column with aggregation sum. Now, create a second group-by on the new aggregation column. Then, use .nlargest as mentioned in the post.
Upvotes: 0