Reputation: 69
i want to find the top n values in a row of a dataframe.
Practical example:
data = {'First': [1, 2,3],
'Second': [2,1,5],
'Third': [5,1,2]
}
df = pd.DataFrame (data, columns = ['First','Second','Third'])
First Second Third
0 1 2 5
1 2 1 3
2 3 5 2
I want to iterate through each row and select the top n values. In this example the top 2 and replace the values with 1 and all others with 0.
So my desired output would look like:
First Second Third
0 0 1 1
1 1 0 1
2 1 1 0
Upvotes: 2
Views: 1059
Reputation: 19312
You can use df.rank
with method set to min
and get the ones that are greater than a number of columns - 2
which is same as (df.shape[1]-n)
to get the top 2. Then astype
to int
-
data = {'First': [1, 2,3],
'Second': [2,1,5],
'Third': [5,3,2]
}
df = pd.DataFrame (data, columns = ['First','Second','Third'])
n = 2 #define top n
(df.rank(1)>(df.shape[1]-n)).astype(int)
First Second Third
0 0 1 1
1 1 0 1
2 1 1 0
Alternate: You can use numpy to solve this as well. The double argsort returns a rank order for each number row-wise. The threshold for top n will be one less in this case since ranking starts from 0 instead. -
new_data = (df.to_numpy().argsort(1).argsort()>(df.shape[1]-n-1)).astype(int)
df2 = pd.DataFrame(new_data, columns = ['First','Second','Third'])
print(df2)
First Second Third
0 0 1 1
1 1 0 1
2 1 1 0
Upvotes: 4
Reputation: 448
You can create a helper function for converting the values to 0 or 1 based on whether they fit in top2 or not.
Then this function can be applied to the dataframe via apply
method.
import pandas as pd
def filter_top_2(row):
top_2_items = sorted(row, reverse=True)[:2]
return [
1
if x in top_2_items
else 0
for x in row
]
data = {
'First': [1, 2, 3],
'Second': [2, 1, 5],
'Third': [5, 1, 2]
}
df = pd.DataFrame(data)
print(df)
print(df.apply(filter_top_2, axis=1, result_type='broadcast'))
Output
First Second Third
0 1 2 5
1 2 1 1
2 3 5 2
First Second Third
0 0 1 1
1 1 1 1
2 1 1 0
Upvotes: 1