beflyguy
beflyguy

Reputation: 135

Returning highest values in row

I have a dataframe with % values. I want to write a function that returns the names of the columns in succession for the highest % per row. For example:

Input dataframe:

Customer ID          Grapes        Oranges       Apples         Bananas     
12345                12.131        39.123        97.847         18.442
67890                11.111        1.111         100.000        40.941

Output dataframe:

Customer ID          Rec 1         Rec 2         Rec 3          Rec 4       ...
12345                Apples        Oranges       Bananas        Grapes
67890                Apples        Bananas       Grapes         Oranges
...

I want this function to work for N amount of rows and columns, as the dataframes are updated/appended to on a regular basis. Any ideas/built in functionality that can do this to some extent? Like an descending order method that goes by row?

Upvotes: 2

Views: 67

Answers (2)

jezrael
jezrael

Reputation: 862406

Use numpy.argsort if performance is important:

N = 4
df = df.set_index('Customer ID')
df1 = pd.DataFrame(df.columns[np.argsort(-df.values, axis=1)[:, :N]], 
                               index=df.index)

df1 = df1.rename(columns=lambda x: 'Rec {}'.format(x + 1))
print (df1)

              Rec 1    Rec 2    Rec 3    Rec 4
Customer ID                                   
12345        Apples  Oranges  Bananas   Grapes
67890        Apples  Bananas   Grapes  Oranges

N = 2
df = df.set_index('Customer ID')
df1 = pd.DataFrame(df.columns[np.argsort(-df.values, axis=1)[:, :N]], 
                               index=df.index)

df1 = df1.rename(columns=lambda x: 'Rec {}'.format(x + 1))
print (df1)

              Rec 1    Rec 2
Customer ID                 
12345        Apples  Oranges
67890        Apples  Bananas

Details:

print ((np.argsort(-df.values, axis=1)[:, :N]))
[[2 1 3 0]
 [2 3 0 1]]

print (df.columns[np.argsort(-df.values, axis=1)[:, :N]])
Index([['Apples', 'Oranges', 'Bananas', 'Grapes'],
       ['Apples', 'Bananas', 'Grapes', 'Oranges']], dtype='object')

Upvotes: 3

BENY
BENY

Reputation: 323226

Using melt with pivot

newdf=df.melt('CustomerID').sort_values(['value'],ascending=False)
newdf.assign(key=newdf.groupby(['CustomerID']).cumcount()+1).pivot(index='CustomerID',columns='key',values='variable')
Out[96]: 
key              1        2        3        4
CustomerID                                   
12345       Apples  Oranges  Bananas   Grapes
67890       Apples  Bananas   Grapes  Oranges

Upvotes: 2

Related Questions