Praneeth Pragallapati
Praneeth Pragallapati

Reputation: 39

MultiLabelBinarizer not working for a column with multiple arrays

I have a column with 15000 arrays. please find sample of 2 such records out of 15000. I want to create dummies for the values in under Genres_relevant.

user Genres_relevant    
 1         [2.0]
 2     [3.0,2.0,1.0]

Code:

from sklearn.preprocessing import MultiLabelBinarizer
df=pd.DataFrame(users_list['Genres_relevant'])
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(df),columns=mlb.classes_, index=df.index)

Expected output

   1.0  2.0  3.0
1   0    1    0
2   1    1    1

Error: The shape of passed values is (12, 1), indices imply (12, 15000)

Upvotes: 1

Views: 397

Answers (1)

mujjiga
mujjiga

Reputation: 16916

pd.DataFrame(mlb.fit_transform(df['Genres_relevant']), columns=mlb.classes_, 
         index=df.index)

When you are fitting do not pass in the full dataframe but rather pass in the column.

Upvotes: 2

Related Questions