Selecting rows with top n values based on multiple columns

Question

I have a dataframe like the one below where all IDs are unique and columns A, B and C hold values between 0 and 1.

df = pd.DataFrame({'A':[1, 0.7, 0, 0.5, 0.3, 0.3], 'B' :[0.6, 0.1, 0.4, 0.3, 0.9, 0.3], 'C':[0.6, 0.3, 0.6, 0.8, 0.9, 0.5], 'ID':['a', 'b', 'c', 'd', 'e', 'f']} )

A	B	C	ID
1	0.6	0.6	a
0.7	0.1	0.3	b
0	0.4	0.6	c
0.5	0.3	0.8	d
0.3	0.9	0.9	e
0.3	0.3	0.5	f

I want to keep just the top n values of A, B and C, so that for n = 2 the dataframe looks as follow:

A	B	C	ID
1	0.6	NaN	a
0.7	NaN	NaN	b
NaN	NaN	0.8	d
NaN	0.9	0.9	e

Doing df.set_index('ID')['A'].nlargest(2).reset_index() gives me:

ID	A
a	1
b	0.7

Is there a simpler way than doing this 3 times and joining the datasets?

BENY · Accepted Answer

Try with nlargest

out = df.set_index('ID').apply(lambda x : x.nlargest(n=2))
Out[654]: 
      A    B    C
ID               
a   1.0  0.6  NaN
b   0.7  NaN  NaN
d   NaN  NaN  0.8
e   NaN  0.9  0.9

Selecting rows with top n values based on multiple columns

Answers (1)

Related Questions