atjw94
atjw94

Reputation: 577

Pandas Groupby nlargest (unique nlargest)

I have a following pandas df:

date                    label              quantity
01/01/2019                A                  5
01/01/2019                B                  5
01/01/2019                C                  5
01/01/2019                D                  4
01/01/2019                E                  3
01/01/2019                F                  2
01/01/2019                G                  0
01/02/2019                A                  7
01/02/2019                B                  6
01/02/2019                C                  5
01/02/2019                D                  5
01/02/2019                E                  3
01/02/2019                F                  2
01/02/2019                G                  1
01/03/2019                A                  10
01/03/2019                B                  9
01/03/2019                C                  8
01/03/2019                D                  7
01/03/2019                E                  6
01/03/2019                F                  5
01/03/2019                G                  4

I am trying to obtain the top three quantities with their respective labels per date, keeping duplicate quantity and labels.

This is my code:

df = df.groupby('date').apply(lambda x: x.nlargest(3, ['quantity'], keep='all')).reset_index(drop=True)

Should there be a duplicated quantity like in 01/01/2019 where A, B and C each has 5 quantity, D and E will not be included, however I will want D and E to be included. I will want a unique nlargest(3) kind of thing.

This is my desired outcome where I have the 3 largest distinct quantity in my df regardless if more than 1 label shares the same quantity:

date                    label              quantity
01/01/2019                A                  5
01/01/2019                B                  5
01/01/2019                C                  5
01/01/2019                D                  4
01/01/2019                E                  3
01/02/2019                A                  7
01/02/2019                B                  6
01/02/2019                C                  5
01/02/2019                D                  5
01/03/2019                A                  10
01/03/2019                B                  9
01/03/2019                C                  8

Thank you in advance!

Upvotes: 2

Views: 1009

Answers (2)

Mykola Zotko
Mykola Zotko

Reputation: 17824

You can find unique values for each group, get the list with three max values and select rows which are in this list. Then you can join the rest of the table:

df1 = df.groupby('date')['quantity'].\
apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\
reset_index(level=0).join(df['label'])

print(df1)

Output:

          date  quantity label
0   01/01/2019         5     A
1   01/01/2019         5     B
2   01/01/2019         5     C
3   01/01/2019         4     D
4   01/01/2019         3     E
7   01/02/2019         7     A
8   01/02/2019         6     B
9   01/02/2019         5     C
10  01/02/2019         5     D
14  01/03/2019        10     A
15  01/03/2019         9     B
16  01/03/2019         8     C

Upvotes: 2

Oleg O
Oleg O

Reputation: 1065

def get_slice(df):
    x_vals = df['quantity'].drop_duplicates().nlargest(3)
    return df[df['quantity'].isin(x_vals)]

print(df.groupby('date').apply(get_slice).reset_index(drop=True))

Upvotes: 2

Related Questions