Reputation: 427
For example the data look like:
df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]}
x=pd.DataFrame(df)
Out[260]:
a1 a2 a3 a4 a5 a6
0 5 23 4 1 4 5
1 6 43 2 2 98 43
2 3 56 3 1 23 3
3 2 2 6 3 5 2
4 5 6 7 2 7 5
I need the result to look like:
top1 top2 top3
a2 a1 a6
a5 a2 a6
....
I've seen answer to a previous questions (see below) that recommends idxmax. But how to handle top n values (n>1)?
Find the column name which has the maximum value for each row
Update:
I find the answer very useful but the only thing is that my data is long so have to figure out a way to bypass that. I ended up saving the data to a csv file and then reading it back in in chunks. here is the code I used:
data = pd.read_csv('xxx.csv', chunksize=1000)
rslt = pd.DataFrame(np.zeros((0,3)), columns=['top1','top2','top3'])
for chunk in data:
x=pd.DataFrame(chunk).T
for i in x.columns:
df1row = pd.DataFrame(x.nlargest(3, i).index.tolist(), index=['top1','top2','top3']).T
rslt = pd.concat([rslt, df1row], axis=0)
rslt=rslt.reset_index(drop=True)
Upvotes: 11
Views: 8671
Reputation: 1721
You can do it like this:
x.T.apply(lambda x: x.sort_values(ascending=False).index).T.filter(['a1','a2','a3']).rename(columns={"a1":'top1',"a2":'top2',"a3":'top3'})
Results:
top1 top2 top3
0 a2 a6 a1
1 a5 a6 a2
2 a2 a5 a6
3 a3 a5 a4
4 a5 a3 a2
Upvotes: 1
Reputation: 1822
import pandas as pd
import numpy as np
df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]}
df=pd.DataFrame(df)
df
a1 a2 a3 a4 a5 a6
0 5 23 4 1 4 5
1 6 43 2 2 98 43
2 3 56 3 1 23 3
3 2 2 6 3 5 2
4 5 6 7 2 7 5
We can solve it using the argsort
from numpy
and apply
, lambda
from pandas
.
The solution:
Tops =pd.DataFrame(df.apply(lambda x:list(df.columns[np.array(x).argsort()[::-1][:3]]), axis=1).to_list(), columns=['Top1', 'Top2', 'Top3'])
Tops
And we get:
Top1 Top2 Top3
0 a2 a6 a1
1 a5 a6 a2
2 a2 a5 a6
3 a3 a5 a4
4 a5 a3 a2
Upvotes: 11
Reputation: 2110
an answer without the need to transpose your data.
first we define how to get the n largest value index from a list (or a row).
def get_n_largest_ind(a,n):
ind = np.argpartition(a, -n)[-n:]
return ind[0]
a = [0.4,0.5,0.4,0.8,0.2,0.1,0.99]
get_n_largest_ind(a,3)
>>>> 1
Now you can define which columns to use for the ranking. Then we loop over the number of top columns you want to show (in this example 3).
cols = df.columns
for n in [1,2,3]:
df["top{}".format(n)] = df[cols].apply(lambda x: cols[get_n_largest_ind(x,n)],axis=1)
Upvotes: 0
Reputation: 1809
What you need is pandas.DataFrame.nlargest.
import pandas as pd
import numpy as np
df={'a1':[5,6,3,2,5],'a2':[23,43,56,2,6], 'a3':[4,2,3,6,7], 'a4':[1,2,1,3,2],'a5':[4,98,23,5,7],'a6':[5,43,3,2,5]}
x=pd.DataFrame(df).T
rslt = pd.DataFrame(np.zeros((0,3)), columns=['top1','top2','top3'])
for i in x.columns:
df1row = pd.DataFrame(x.nlargest(3, i).index.tolist(), index=['top1','top2','top3']).T
rslt = pd.concat([rslt, df1row], axis=0)
print rslt
Out[52]:
top1 top2 top3
0 a2 a1 a6
0 a5 a2 a6
0 a2 a5 a1
0 a3 a5 a4
0 a3 a5 a2
Upvotes: 6