Reputation: 275
I have a pivoted dataframe with customer recommendations that looks like the following:
I want to reshape this dataframe or form another new dataframe containing CustomerID column and 4 new additional columns called 'Recommendation#1','Recommendation#2','Recommendation#3' and 'Recommendation#4', where each column will have the top 4 highest scored items for each customer. I am trying to make it look something like this:
However I am a bit lost in how to approach this in python to obtain my wanted dataframe from my pivoted dataframe. Any help or suggestions on how to approach this would be very appreciated!
Upvotes: 1
Views: 59
Reputation: 88226
You could use argsort
and keep the highest n
elements in each row:
n=4
data = df.values.argsort().T[::-1].T[:,:n]
And build a dataframe from the obtained array, and columns that you specify:
columns = ['Recommendation#{}'.format(i) for i in range(1,data.shape[1]+1)]
df = pd.DataFrame(df.columns[data], columns=columns, index = df.index)
print(df)
Recommendation#1 Recommendation#2 Recommendation#3 Recommendation#4
ID1 Item4 Item2 Item8 Item6
ID2 Item5 Item1 Item4 Item6
ID3 Item7 Item6 Item8 Item3
Upvotes: 2
Reputation: 4137
First the get the dataframe values
, then sort them with numpy.sort
, reverse the order and get the 4 first entries. Then set this data into a new dataframe:
import pandas as pd
d = {'I1':[0,5,0],'I2':[5,0,0],'I3':[1.2,0,3],'I4':[5,4,0],'I5':[2,5,0],'I6':[3,3,5],'I7':[0,0,5],'I8':[3,0,4]}
df = pd.DataFrame(d)
print(df)
# I1 I2 I3 I4 I5 I6 I7 I8
# 0 0 5 1.2 5 2 3 0 3
# 1 5 0 0.0 4 5 3 0 0
# 2 0 0 3.0 0 0 5 5 4
a = df.values
a = a.argsort(axis=1)
a = a[:,::-1][:,:4]+1
df2 = pd.DataFrame(columns=['Rec1', 'Rec2', 'Rec3', 'Rec4'], data=a)
df2[list(df2)] = 'I' + df2[list(df2)].astype(str)
print(df2)
# Rec1 Rec2 Rec3 Rec4
# 0 I4 I2 I8 I6
# 1 I5 I1 I4 I6
# 2 I7 I6 I8 I3
Upvotes: 0
Reputation: 16147
import pandas as pd
import numpy as np
df = pd.DataFrame({'CustomerID':['ID1','ID2','ID3'],
'Item1':[0,5,0],
'Item2':[5,0,0],
'Item3':[1.2,0,3],
'Item4':[5,4,0],
'Item5':[2,5,0],
'Item6':[3,3,5],
'Item7':[0,0,5],
'Item8':[3,0,4]})
df.set_index('CustomerID', inplace=True)
df = df * -1
df2 = df.apply(np.sort, axis=1)
df2 = abs(df2)
Output
CustomerID
ID1 [5.0, 5.0, 3.0, 3.0, 2.0, 1.2, 0.0, 0.0]
ID2 [5.0, 5.0, 4.0, 3.0, 0.0, 0.0, 0.0, 0.0]
ID3 [5.0, 5.0, 4.0, 3.0, 0.0, 0.0, 0.0, 0.0]
Upvotes: -1