Reputation: 73

A function that will return the index of the 3 highest entries entries in each column in a dataframe

If i have the following dataframe

           Base Pay   Overtime Pay    Other Pay   Benefits   
Adam        200000       31000          5000       64000
Ben         210000       27000          7000       57000
Scott       190000       40000          9000       65000
David       220000       26000          4000       61000
Matthew     195000       29000         10000       63000
Mark        205000       37000          8000       59000

Then i am looking to return the following dataframe

                     1st      2nd     3rd
Base Pay            David    Ben     Mark
Overtime Pay        Scott    Mark     Adam  
Other Pay          Matthew  Scott     Mark
Benefits            Scott    Adam    Matthew

I know how to compute the 3 largest values in each column, but not simultaneously.

Upvotes: 0

Answers (3)

Henry Yik

Reputation: 22503

Using argsort:

df = pd.DataFrame(...).T

result = pd.DataFrame(df.columns[(-df.values).argsort(axis=1)[:, :3]],
                      columns=["1st","2nd","3rd"],
                      index=df.index)

print (result)

#
                 1st    2nd      3rd
BasePay        David    Ben     Mark
OvertimePay    Scott   Mark     Adam
OtherPay     Matthew  Scott     Mark
Benefits       Scott   Adam  Matthew

Upvotes: 3

Adam Zeldin

Reputation: 908

A little shorter:

df = df.T.apply(lambda s: s.abs().nlargest(3).index.tolist(), axis=1)

>>> df2 = pd.DataFrame()
>>> df2[['1st','2nd','3rd']] = pd.DataFrame(df.values.tolist(), index= df.index)

Upvotes: 2

BENY

Reputation: 323276

Here is one way

s=df.stack().sort_values(ascending=False).groupby(level=1).head(3).reset_index()
s['Id']=s.groupby('level_1').cumcount()+1
s.pivot(index='level_1',columns='Id',values='level_0')
Out[114]: 
Id                 1      2        3
level_1                             
BasePay        David    Ben     Mark
Benefits       Scott   Adam  Matthew
OtherPay     Matthew  Scott     Mark
OvertimePay    Scott   Mark     Adam

Upvotes: 1

A function that will return the index of the 3 highest entries entries in each column in a dataframe

Answers (3)

Related Questions