Reputation: 285

Reshaping/Transforming Pandas dataframe from pivoted table

I have a pivoted dataframe with customer recommendations that looks like the following:

I want to reshape this dataframe or form another new dataframe containing CustomerID column and 4 new additional columns called 'Recommendation#1','Recommendation#2','Recommendation#3' and 'Recommendation#4', where each column will have the top 4 highest scored items for each customer. I am trying to make it look something like this:

However I am a bit lost in how to approach this in python to obtain my wanted dataframe from my pivoted dataframe. Any help or suggestions on how to approach this would be very appreciated!

Upvotes: 1

Answers (3)

yatu

Reputation: 88305

You could use argsort and keep the highest n elements in each row:

n=4
data = df.values.argsort().T[::-1].T[:,:n]

And build a dataframe from the obtained array, and columns that you specify:

columns = ['Recommendation#{}'.format(i) for i in range(1,data.shape[1]+1)]
df = pd.DataFrame(df.columns[data], columns=columns, index = df.index) 
print(df)

         Recommendation#1 Recommendation#2 Recommendation#3 Recommendation#4
ID1            Item4            Item2            Item8            Item6
ID2            Item5            Item1            Item4            Item6
ID3            Item7            Item6            Item8            Item3

Upvotes: 2

b-fg

Reputation: 4137

First the get the dataframe values, then sort them with numpy.sort, reverse the order and get the 4 first entries. Then set this data into a new dataframe:

import pandas as pd

d = {'I1':[0,5,0],'I2':[5,0,0],'I3':[1.2,0,3],'I4':[5,4,0],'I5':[2,5,0],'I6':[3,3,5],'I7':[0,0,5],'I8':[3,0,4]}

df = pd.DataFrame(d)
print(df)
#    I1  I2   I3  I4  I5  I6  I7  I8
# 0   0   5  1.2   5   2   3   0   3
# 1   5   0  0.0   4   5   3   0   0
# 2   0   0  3.0   0   0   5   5   4
a = df.values
a = a.argsort(axis=1)
a = a[:,::-1][:,:4]+1
df2 = pd.DataFrame(columns=['Rec1', 'Rec2', 'Rec3', 'Rec4'], data=a)
df2[list(df2)] = 'I' + df2[list(df2)].astype(str)
print(df2)
#   Rec1 Rec2 Rec3 Rec4
# 0   I4   I2   I8   I6
# 1   I5   I1   I4   I6
# 2   I7   I6   I8   I3

Upvotes: 0

Chris

Reputation: 16172

import pandas as pd
import numpy as np
df = pd.DataFrame({'CustomerID':['ID1','ID2','ID3'],
                   'Item1':[0,5,0],
                   'Item2':[5,0,0],
                   'Item3':[1.2,0,3],
                   'Item4':[5,4,0],
                   'Item5':[2,5,0],
                   'Item6':[3,3,5],
                   'Item7':[0,0,5],
                   'Item8':[3,0,4]})

df.set_index('CustomerID', inplace=True)
df = df * -1
df2 = df.apply(np.sort, axis=1)
df2 = abs(df2)

Output

    CustomerID
ID1    [5.0, 5.0, 3.0, 3.0, 2.0, 1.2, 0.0, 0.0]
ID2    [5.0, 5.0, 4.0, 3.0, 0.0, 0.0, 0.0, 0.0]
ID3    [5.0, 5.0, 4.0, 3.0, 0.0, 0.0, 0.0, 0.0]

Upvotes: -1

Reshaping/Transforming Pandas dataframe from pivoted table

Answers (3)

Related Questions