Reputation: 135
I have a dataframe with % values. I want to write a function that returns the names of the columns in succession for the highest % per row. For example:
Input dataframe:
Customer ID Grapes Oranges Apples Bananas
12345 12.131 39.123 97.847 18.442
67890 11.111 1.111 100.000 40.941
Output dataframe:
Customer ID Rec 1 Rec 2 Rec 3 Rec 4 ...
12345 Apples Oranges Bananas Grapes
67890 Apples Bananas Grapes Oranges
...
I want this function to work for N amount of rows and columns, as the dataframes are updated/appended to on a regular basis. Any ideas/built in functionality that can do this to some extent? Like an descending order method that goes by row?
Upvotes: 2
Views: 67
Reputation: 862406
Use numpy.argsort
if performance is important:
N = 4
df = df.set_index('Customer ID')
df1 = pd.DataFrame(df.columns[np.argsort(-df.values, axis=1)[:, :N]],
index=df.index)
df1 = df1.rename(columns=lambda x: 'Rec {}'.format(x + 1))
print (df1)
Rec 1 Rec 2 Rec 3 Rec 4
Customer ID
12345 Apples Oranges Bananas Grapes
67890 Apples Bananas Grapes Oranges
N = 2
df = df.set_index('Customer ID')
df1 = pd.DataFrame(df.columns[np.argsort(-df.values, axis=1)[:, :N]],
index=df.index)
df1 = df1.rename(columns=lambda x: 'Rec {}'.format(x + 1))
print (df1)
Rec 1 Rec 2
Customer ID
12345 Apples Oranges
67890 Apples Bananas
Details:
print ((np.argsort(-df.values, axis=1)[:, :N]))
[[2 1 3 0]
[2 3 0 1]]
print (df.columns[np.argsort(-df.values, axis=1)[:, :N]])
Index([['Apples', 'Oranges', 'Bananas', 'Grapes'],
['Apples', 'Bananas', 'Grapes', 'Oranges']], dtype='object')
Upvotes: 3
Reputation: 323226
Using melt
with pivot
newdf=df.melt('CustomerID').sort_values(['value'],ascending=False)
newdf.assign(key=newdf.groupby(['CustomerID']).cumcount()+1).pivot(index='CustomerID',columns='key',values='variable')
Out[96]:
key 1 2 3 4
CustomerID
12345 Apples Oranges Bananas Grapes
67890 Apples Bananas Grapes Oranges
Upvotes: 2