Reputation: 612
I have a dataset of the following form:
Id Class
1 a
2 b
2 c
3 c
3 d
3 a
3 e
3 f
4 g
And I need to prep this data to perform a multi-label classification so I use:
df.groupby("Id").Class.apply(','.join).reset_index()
to get:
Id Class
1 a
2 b,c
3 c,d,e,f
4 g
Now the MultiLabelBinarizer
is unable to process this in its current form because df.Class
is represented as
("a", "b,c", "c,d,e,f", "g")
however, it is supposed to be in the form
[["a"], ["b","c"], ["c","d","e","f"],["g"]]
How should I go about it?
Upvotes: 3
Views: 381