Reputation: 2317
I have this dataset:
Id query count
001 abc 20
001 bcd 30
001 ccd 100
002 ace 13
002 ahhd 30
002 ahe 28
I want to find the Top2 query for each Id, based on the count. So I want to see:
Id query count
001 ccd 100
001 bcd 30
002 ahhd 30
002 ahe 28
I tried these two lines of code:
df.groupby('Id')['count'].nlargest(2)
, the "query" column is lost in the result, which is not what I wanted. So how to keep query in my result.
Id count
001 100
001 30
002 30
002 28
Upvotes: 1
Views: 1044
Reputation: 294318
I use a groupby
and apply
the method pd.DataFrame.nlargest
. This differs from pd.Series.nlargest
in that I have to specify a set of columns to consider when choosing my n rows. This solution keeps the original index values that are attached to the rows, if that is at all important to the OP or end user.
df.groupby('Id', group_keys=False).apply(
pd.DataFrame.nlargest, n=2, columns='count')
Id query count
2 1 ccd 100
1 1 bcd 30
4 2 ahhd 30
5 2 ahe 28
Upvotes: 2
Reputation: 2293
You could do this with groupby still:
df.sort_values('count', ascending = False).groupby('Id').head(2)
Upvotes: 1