Sam
Sam

Reputation: 89

Selecting rows with top n values based on multiple columns

I have a dataframe like the one below where all IDs are unique and columns A, B and C hold values between 0 and 1.

df = pd.DataFrame({'A':[1, 0.7, 0, 0.5, 0.3, 0.3], 'B' :[0.6, 0.1, 0.4, 0.3, 0.9, 0.3], 'C':[0.6, 0.3, 0.6, 0.8, 0.9, 0.5], 'ID':['a', 'b', 'c', 'd', 'e', 'f']} )
A B C ID
1 0.6 0.6 a
0.7 0.1 0.3 b
0 0.4 0.6 c
0.5 0.3 0.8 d
0.3 0.9 0.9 e
0.3 0.3 0.5 f

I want to keep just the top n values of A, B and C, so that for n = 2 the dataframe looks as follow:

A B C ID
1 0.6 NaN a
0.7 NaN NaN b
NaN NaN 0.8 d
NaN 0.9 0.9 e

Doing df.set_index('ID')['A'].nlargest(2).reset_index() gives me:

ID A
a 1
b 0.7

Is there a simpler way than doing this 3 times and joining the datasets?

Upvotes: 0

Views: 104

Answers (1)

BENY
BENY

Reputation: 323376

Try with nlargest

out = df.set_index('ID').apply(lambda x : x.nlargest(n=2))
Out[654]: 
      A    B    C
ID               
a   1.0  0.6  NaN
b   0.7  NaN  NaN
d   NaN  NaN  0.8
e   NaN  0.9  0.9

Upvotes: 3

Related Questions