danche
danche

Reputation: 1815

How to encode multi-label representation using index?

I want to encode [[1, 2], [4]] to

[[0, 1, 1, 0, 0],
[0, 0, 0, 0, 1]]

while sklearn.preprocessing.MultiLabelbinarizer only gives

[[1, 1, 0],
[0, 0, 1]]

Anyone knows how to do it using Numpy or Pandas or sklearn built-in function?

Upvotes: 1

Views: 695

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36619

MultilabelBinarizer will only know what you send in it. When it sees only 3 distinct classes, it will assign 3 columns only.

You need to set the classes param to set the total number of classes you are expecting in your dataset (in the order you want in the columns):

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer(classes=[0,1,2,3,4])
mlb.fit_transform([[1, 2], [4]])

#Output
array([[0, 1, 1, 0, 0],
       [0, 0, 0, 0, 1]])

Upvotes: 2

Related Questions