Reputation: 89
I have a dataframe like the one below where all IDs are unique and columns A, B and C hold values between 0 and 1.
df = pd.DataFrame({'A':[1, 0.7, 0, 0.5, 0.3, 0.3], 'B' :[0.6, 0.1, 0.4, 0.3, 0.9, 0.3], 'C':[0.6, 0.3, 0.6, 0.8, 0.9, 0.5], 'ID':['a', 'b', 'c', 'd', 'e', 'f']} )
A | B | C | ID |
---|---|---|---|
1 | 0.6 | 0.6 | a |
0.7 | 0.1 | 0.3 | b |
0 | 0.4 | 0.6 | c |
0.5 | 0.3 | 0.8 | d |
0.3 | 0.9 | 0.9 | e |
0.3 | 0.3 | 0.5 | f |
I want to keep just the top n values of A, B and C, so that for n = 2 the dataframe looks as follow:
A | B | C | ID |
---|---|---|---|
1 | 0.6 | NaN | a |
0.7 | NaN | NaN | b |
NaN | NaN | 0.8 | d |
NaN | 0.9 | 0.9 | e |
Doing df.set_index('ID')['A'].nlargest(2).reset_index() gives me:
ID | A |
---|---|
a | 1 |
b | 0.7 |
Is there a simpler way than doing this 3 times and joining the datasets?
Upvotes: 0
Views: 104
Reputation: 323376
Try with nlargest
out = df.set_index('ID').apply(lambda x : x.nlargest(n=2))
Out[654]:
A B C
ID
a 1.0 0.6 NaN
b 0.7 NaN NaN
d NaN NaN 0.8
e NaN 0.9 0.9
Upvotes: 3