EvaJ
EvaJ

Reputation: 23

Out of a specified number of columns, find which have the top 3 largest values in each row

I am looking to find the 3 largest values for each row in my DataFrame, but only from certain columns (i.e. there are a total of 10 columns in my DataFrame but I only want it to consider 6 columns when calculating the largest values). When finding the 3 largest,

I want to create three new columns in my existing DataFrame called 'Top 1' 'Top 2' and 'Top 3'. I am using Pandas in Python.

This is my code:

df_2 = pd.DataFrame(
    df_1.apply(
        lambda x: list(
            df_1.columns[np.array(x).argsort()[::-1][:3]]
        ), axis=1
    ).to_list(), columns=['Top1', 'Top2', 'Top3']
)

I am getting an error message because this code considers my whole dataset when I only want to look at the columns ['t1', 't2', 't3', 't4', 't5', 't6']. Where would I enter this specification in my code?

Upvotes: 0

Views: 51

Answers (2)

Anurag Dabas
Anurag Dabas

Reputation: 24322

Try Via list comprehension, Dataframe() method and nlargest() method:

col= ['t1', 't2', 't3', 't4', 't5', 't6']

out=pd.DataFrame([df[x].nlargest(3) for x in col])

OR

Modify your method a little:

df_2 = pd.DataFrame(
    df_1[col].apply(
        lambda x: list(
            df_1.columns[np.array(x).argsort()[::-1][:3]]
        ), axis=1
    ).to_list(), columns=['Top1', 'Top2', 'Top3']
)

Upvotes: 0

jezrael
jezrael

Reputation: 863801

For improve performance dont use apply, because loops under the hood:

cols = ['t1', 't2', 't3', 't4', 't5', 't6']

df = pd.DataFrame(np.array(cols)[np.argsort(-df[cols].to_numpy(), axis=1)[:, :3]], 
                  columns=['Top1', 'Top2', 'Top3'])
print (df)

Upvotes: 1

Related Questions