Pandas Groupby nlargest (unique nlargest)

Question

I have a following pandas df:

date                    label              quantity
01/01/2019                A                  5
01/01/2019                B                  5
01/01/2019                C                  5
01/01/2019                D                  4
01/01/2019                E                  3
01/01/2019                F                  2
01/01/2019                G                  0
01/02/2019                A                  7
01/02/2019                B                  6
01/02/2019                C                  5
01/02/2019                D                  5
01/02/2019                E                  3
01/02/2019                F                  2
01/02/2019                G                  1
01/03/2019                A                  10
01/03/2019                B                  9
01/03/2019                C                  8
01/03/2019                D                  7
01/03/2019                E                  6
01/03/2019                F                  5
01/03/2019                G                  4

I am trying to obtain the top three quantities with their respective labels per date, keeping duplicate quantity and labels.

This is my code:

df = df.groupby('date').apply(lambda x: x.nlargest(3, ['quantity'], keep='all')).reset_index(drop=True)

Should there be a duplicated quantity like in 01/01/2019 where A, B and C each has 5 quantity, D and E will not be included, however I will want D and E to be included. I will want a unique nlargest(3) kind of thing.

This is my desired outcome where I have the 3 largest distinct quantity in my df regardless if more than 1 label shares the same quantity:

date                    label              quantity
01/01/2019                A                  5
01/01/2019                B                  5
01/01/2019                C                  5
01/01/2019                D                  4
01/01/2019                E                  3
01/02/2019                A                  7
01/02/2019                B                  6
01/02/2019                C                  5
01/02/2019                D                  5
01/03/2019                A                  10
01/03/2019                B                  9
01/03/2019                C                  8

Thank you in advance!

Mykola Zotko · Accepted Answer

You can find unique values for each group, get the list with three max values and select rows which are in this list. Then you can join the rest of the table:

df1 = df.groupby('date')['quantity'].\
apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\
reset_index(level=0).join(df['label'])

print(df1)

Output:

          date  quantity label
0   01/01/2019         5     A
1   01/01/2019         5     B
2   01/01/2019         5     C
3   01/01/2019         4     D
4   01/01/2019         3     E
7   01/02/2019         7     A
8   01/02/2019         6     B
9   01/02/2019         5     C
10  01/02/2019         5     D
14  01/03/2019        10     A
15  01/03/2019         9     B
16  01/03/2019         8     C

Pandas Groupby nlargest (unique nlargest)

Answers (2)

Related Questions