user3116949
user3116949

Reputation: 275

Reshaping/Transforming Pandas dataframe from pivoted table

I have a pivoted dataframe with customer recommendations that looks like the following:

enter image description here

I want to reshape this dataframe or form another new dataframe containing CustomerID column and 4 new additional columns called 'Recommendation#1','Recommendation#2','Recommendation#3' and 'Recommendation#4', where each column will have the top 4 highest scored items for each customer. I am trying to make it look something like this:

enter image description here

However I am a bit lost in how to approach this in python to obtain my wanted dataframe from my pivoted dataframe. Any help or suggestions on how to approach this would be very appreciated!

Upvotes: 1

Views: 59

Answers (3)

yatu
yatu

Reputation: 88226

You could use argsort and keep the highest n elements in each row:

n=4
data = df.values.argsort().T[::-1].T[:,:n] 

And build a dataframe from the obtained array, and columns that you specify:

columns = ['Recommendation#{}'.format(i) for i in range(1,data.shape[1]+1)]
df = pd.DataFrame(df.columns[data], columns=columns, index = df.index) 
print(df)

         Recommendation#1 Recommendation#2 Recommendation#3 Recommendation#4
ID1            Item4            Item2            Item8            Item6
ID2            Item5            Item1            Item4            Item6
ID3            Item7            Item6            Item8            Item3

Upvotes: 2

b-fg
b-fg

Reputation: 4137

First the get the dataframe values, then sort them with numpy.sort, reverse the order and get the 4 first entries. Then set this data into a new dataframe:

import pandas as pd

d = {'I1':[0,5,0],'I2':[5,0,0],'I3':[1.2,0,3],'I4':[5,4,0],'I5':[2,5,0],'I6':[3,3,5],'I7':[0,0,5],'I8':[3,0,4]}

df = pd.DataFrame(d)
print(df)
#    I1  I2   I3  I4  I5  I6  I7  I8
# 0   0   5  1.2   5   2   3   0   3
# 1   5   0  0.0   4   5   3   0   0
# 2   0   0  3.0   0   0   5   5   4
a = df.values
a = a.argsort(axis=1)
a = a[:,::-1][:,:4]+1
df2 = pd.DataFrame(columns=['Rec1', 'Rec2', 'Rec3', 'Rec4'], data=a)
df2[list(df2)] = 'I' + df2[list(df2)].astype(str)
print(df2)
#   Rec1 Rec2 Rec3 Rec4
# 0   I4   I2   I8   I6
# 1   I5   I1   I4   I6
# 2   I7   I6   I8   I3

Upvotes: 0

Chris
Chris

Reputation: 16147

import pandas as pd
import numpy as np
df = pd.DataFrame({'CustomerID':['ID1','ID2','ID3'],
                   'Item1':[0,5,0],
                   'Item2':[5,0,0],
                   'Item3':[1.2,0,3],
                   'Item4':[5,4,0],
                   'Item5':[2,5,0],
                   'Item6':[3,3,5],
                   'Item7':[0,0,5],
                   'Item8':[3,0,4]})

df.set_index('CustomerID', inplace=True)
df = df * -1
df2 = df.apply(np.sort, axis=1)
df2 = abs(df2)

Output

    CustomerID
ID1    [5.0, 5.0, 3.0, 3.0, 2.0, 1.2, 0.0, 0.0]
ID2    [5.0, 5.0, 4.0, 3.0, 0.0, 0.0, 0.0, 0.0]
ID3    [5.0, 5.0, 4.0, 3.0, 0.0, 0.0, 0.0, 0.0]

Upvotes: -1

Related Questions