How to use Pandas Groupby and nlargest

Question

I have a Dataframe df, where I am trying to use groupby and nlargest together but am having trouble getting the output I want:

           Date Symbol  Count
0       2013 Q2   AAAA      8
1       2013 Q2   BBBB      3
2       2013 Q2   CCCC      8
3       2013 Q2   DDDD      4
4       2013 Q2   EEEE      5
5       2013 Q3   AAAA     22
6       2013 Q3   BBBB     11
7       2013 Q3   CCCC      3
8       2013 Q3   DDDD     15
9       2013 Q3   EEEE      2
10      2013 Q3   FFFF     10

I want to use groupby to select by date, then get the top 2 symbols by count for that date, so that the output looks like:

           Date Symbol  Count
0       2013 Q2   AAAA      8
2       2013 Q2   CCCC      8
5       2013 Q3   AAAA     22
8       2013 Q3   DDDD     15

But when I try df = df.groupby(['Date'])['Count'].nlargest(2), the output looks something like

2013 Q2  0   8
         2   8
2013 Q3  5   22
         8   15

Erfan · Accepted Answer

Method 1:

We can sort_values by Count then use Groupby.head to get the top 2 rows per group:

dfn = df.sort_values('Count', ascending=False).groupby('Date').head(2).sort_index()

      Date Symbol  Count
0  2013 Q2   AAAA      8
2  2013 Q2   CCCC      8
5  2013 Q3   AAAA     22
8  2013 Q3   DDDD     15

Method 2:

Using apply with isin

dfn = df[df.groupby('Date')['Count'].apply(lambda x: x.isin(x.nlargest(2)))]

      Date Symbol  Count
0  2013 Q2   AAAA      8
2  2013 Q2   CCCC      8
5  2013 Q3   AAAA     22
8  2013 Q3   DDDD     15

Note: this method will return multiple rows if your top 2 values consist of multiple rows.

How to use Pandas Groupby and nlargest

Answers (2)

Method 1:

Method 2:

Related Questions