Reputation: 1815
I want to encode [[1, 2], [4]]
to
[[0, 1, 1, 0, 0],
[0, 0, 0, 0, 1]]
while sklearn.preprocessing.MultiLabelbinarizer
only gives
[[1, 1, 0],
[0, 0, 1]]
Anyone knows how to do it using Numpy
or Pandas
or sklearn
built-in function?
Upvotes: 1
Views: 695
Reputation: 36619
MultilabelBinarizer will only know what you send in it. When it sees only 3 distinct classes, it will assign 3 columns only.
You need to set the classes
param to set the total number of classes you are expecting in your dataset (in the order you want in the columns):
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=[0,1,2,3,4])
mlb.fit_transform([[1, 2], [4]])
#Output
array([[0, 1, 1, 0, 0],
[0, 0, 0, 0, 1]])
Upvotes: 2