Reputation: 577
I have a following pandas df:
date label quantity
01/01/2019 A 5
01/01/2019 B 5
01/01/2019 C 5
01/01/2019 D 4
01/01/2019 E 3
01/01/2019 F 2
01/01/2019 G 0
01/02/2019 A 7
01/02/2019 B 6
01/02/2019 C 5
01/02/2019 D 5
01/02/2019 E 3
01/02/2019 F 2
01/02/2019 G 1
01/03/2019 A 10
01/03/2019 B 9
01/03/2019 C 8
01/03/2019 D 7
01/03/2019 E 6
01/03/2019 F 5
01/03/2019 G 4
I am trying to obtain the top three quantities with their respective labels per date, keeping duplicate quantity and labels.
This is my code:
df = df.groupby('date').apply(lambda x: x.nlargest(3, ['quantity'], keep='all')).reset_index(drop=True)
Should there be a duplicated quantity like in 01/01/2019 where A, B and C each has 5 quantity, D and E will not be included, however I will want D and E to be included. I will want a unique nlargest(3) kind of thing.
This is my desired outcome where I have the 3 largest distinct quantity in my df regardless if more than 1 label shares the same quantity:
date label quantity
01/01/2019 A 5
01/01/2019 B 5
01/01/2019 C 5
01/01/2019 D 4
01/01/2019 E 3
01/02/2019 A 7
01/02/2019 B 6
01/02/2019 C 5
01/02/2019 D 5
01/03/2019 A 10
01/03/2019 B 9
01/03/2019 C 8
Thank you in advance!
Upvotes: 2
Views: 1009
Reputation: 17824
You can find unique values for each group, get the list with three max values and select rows which are in this list. Then you can join the rest of the table:
df1 = df.groupby('date')['quantity'].\
apply(lambda x: x[x.isin(np.sort(x.unique())[-3:])]).\
reset_index(level=0).join(df['label'])
print(df1)
Output:
date quantity label
0 01/01/2019 5 A
1 01/01/2019 5 B
2 01/01/2019 5 C
3 01/01/2019 4 D
4 01/01/2019 3 E
7 01/02/2019 7 A
8 01/02/2019 6 B
9 01/02/2019 5 C
10 01/02/2019 5 D
14 01/03/2019 10 A
15 01/03/2019 9 B
16 01/03/2019 8 C
Upvotes: 2
Reputation: 1065
def get_slice(df):
x_vals = df['quantity'].drop_duplicates().nlargest(3)
return df[df['quantity'].isin(x_vals)]
print(df.groupby('date').apply(get_slice).reset_index(drop=True))
Upvotes: 2