Reputation: 1231
I have a matrix of values. I want to rank the values in the columns and then set top ranked values to 1 and others to zero.
I have tried to do this using nlargest
, head
but the only solution I can figure out is to apply mask
twice.
My solution is below, but is there a smarter way to do this?
many thanks
John
import pandas as pd
df = pd.DataFrame([(1, 2, 3),
(4, 5, 6),
(7, 8, 9),
(11, 21, 31),
(41, 51, 31),
(71, 51, 61),
(71, 81, 91)],
columns=('value_1','value_2','value_3'))
value_1 | value_2 | value_3 | |
---|---|---|---|
0 | 1 | 2 | 3 |
1 | 4 | 5 | 6 |
2 | 7 | 8 | 9 |
3 | 11 | 21 | 31 |
4 | 41 | 51 | 31 |
5 | 71 | 51 | 61 |
6 | 71 | 81 | 91 |
N = 3 # arbitrary cut off
df = df.rank(ascending=False, axis=0, method='min')
df.mask(df > N, 0, inplace=True)
df.mask(df > 0, 1, inplace=True) # i.e. values not previously masked
Resulting df
value_1 | value_2 | value_3 | |
---|---|---|---|
0 | 0 | 0 | 0 |
1 | 0 | 0 | 0 |
2 | 0 | 0 | 0 |
3 | 0 | 0 | 1 |
4 | 1 | 1 | 1 |
5 | 1 | 1 | 1 |
6 | 1 | 1 | 1 |
Upvotes: 1
Views: 905
Reputation: 14063
Try creating the boolean values and then use astype
(~(df.rank(ascending=False, axis=0, method='min') > N)).astype(int)
value_1 value_2 value_3
0 0 0 0
1 0 0 0
2 0 0 0
3 0 0 1
4 1 1 1
5 1 1 1
6 1 1 1
Upvotes: 1