Andi
Andi

Reputation: 4855

Extract pandas column names based on conditions

Based on a pandas DataFrame df, I performed a ranking, which can be seen in rank_df.

Now, I would like to create a new DataFrame results that consists of three columns ["first", "second", "third"]. This DataFrame should be filled with the respective column names of rank_df. For example, the first row of results could potentially comprise ['ticker_3', 'ticker_1', 'ticker_4']. To put it differently, column first of results should always contain the column name of rank_df, which has the highest ranking. And so on...

import numpy as np
import pandas as pd

np.random.seed(123)

cols = ["ticker_" + str(i + 1) for i in range(5)]
df = pd.DataFrame(np.random.rand(3, 5), columns=cols)
df

Output:

   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0  0.696469  0.286139  0.226851  0.551315  0.719469
1  0.423106  0.980764  0.684830  0.480932  0.392118
2  0.343178  0.729050  0.438572  0.059678  0.398044

Generate rank_df:

rank_df = df.rank(axis=1, method="first", ascending=False)
rank_df

Output:

   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0       2.0       4.0       5.0       3.0       1.0
1       4.0       1.0       2.0       3.0       5.0
2       4.0       1.0       2.0       5.0       3.0

Need to generate results,

# NaNs in this final DataFrame needs to be filled with the respective column names
results = pd.DataFrame(None, index=rank_df.index, columns=["first", "second", "third"])

Upvotes: 2

Views: 63

Answers (2)

Scott Boston
Scott Boston

Reputation: 153460

Another way is using pandas reshaping:

rank_df.reset_index().melt('index').pivot('index', 'value', 'variable')\
       .rename(columns={1.0:'first', 2.0:'second', 3.0:'third'}).iloc[:, :3]

Output:

value     first    second     third
index                              
0      ticker_5  ticker_1  ticker_4
1      ticker_2  ticker_3  ticker_4
2      ticker_2  ticker_3  ticker_5

Upvotes: 2

anky
anky

Reputation: 75080

IIUC , you can try with argsort:

print(df)
    ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036

results[:] = df.columns.to_numpy()[np.argsort(-df)][:,:3] #change 3 to n as reqd
print(results)

      first    second     third
0  ticker_2  ticker_3  ticker_1
1  ticker_4  ticker_3  ticker_1
2  ticker_4  ticker_1  ticker_3

Upvotes: 3

Related Questions