Reputation: 968
I have a Dataframe df
, where I am trying to use groupby
and nlargest
together but am having trouble getting the output I want:
Date Symbol Count
0 2013 Q2 AAAA 8
1 2013 Q2 BBBB 3
2 2013 Q2 CCCC 8
3 2013 Q2 DDDD 4
4 2013 Q2 EEEE 5
5 2013 Q3 AAAA 22
6 2013 Q3 BBBB 11
7 2013 Q3 CCCC 3
8 2013 Q3 DDDD 15
9 2013 Q3 EEEE 2
10 2013 Q3 FFFF 10
I want to use groupby
to select by date
, then get the top 2 symbols
by count
for that date, so that the output looks like:
Date Symbol Count
0 2013 Q2 AAAA 8
2 2013 Q2 CCCC 8
5 2013 Q3 AAAA 22
8 2013 Q3 DDDD 15
But when I try df = df.groupby(['Date'])['Count'].nlargest(2)
, the output looks something like
2013 Q2 0 8
2 8
2013 Q3 5 22
8 15
Upvotes: 1
Views: 1909
Reputation: 42916
We can sort_values
by Count
then use Groupby.head
to get the top 2 rows per group:
dfn = df.sort_values('Count', ascending=False).groupby('Date').head(2).sort_index()
Date Symbol Count
0 2013 Q2 AAAA 8
2 2013 Q2 CCCC 8
5 2013 Q3 AAAA 22
8 2013 Q3 DDDD 15
Using apply
with isin
dfn = df[df.groupby('Date')['Count'].apply(lambda x: x.isin(x.nlargest(2)))]
Date Symbol Count
0 2013 Q2 AAAA 8
2 2013 Q2 CCCC 8
5 2013 Q3 AAAA 22
8 2013 Q3 DDDD 15
Note: this method will return multiple rows if your top 2 values consist of multiple rows.
Upvotes: 3
Reputation: 18367
Alternatively, perform the groupby
with nlargest()
and then merge it back:
print(df.merge(df.groupby('Date')['Count'].nlargest(2)).drop_duplicates())
Output:
Date Symbol Count
0 2013 Q2 A 8
2 2013 Q2 C 8
4 2013 Q3 A 22
5 2013 Q3 D 15
Upvotes: 0