Reputation: 2666
If I one-hot encode a column with 3 possible values like this:
from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit([0, 1, 2])
lb.classes_
lb.transform([1, 0])
Then I get:
array([[0, 1, 0],
[1, 0, 0]])
which is exactly what I would like. 3 columns = 1 column for each possible value.
But, if I have 2 possible values like this:
lb.fit([0, 1])
lb.classes_
lb.transform([1, 0])
I get:
array([[1],
[0]])
which is only 1 column, even if I have 2 possible values. What I would like to end up in this case is:
array([[0, 1],
[1, 0]])
How can I get the 2 column result in this case?
Upvotes: 2
Views: 1823
Reputation: 2666
It looks like pandas.get_dummies is the easiest solution in my case:
pd.get_dummies([1, 0])
Upvotes: 0
Reputation: 798
You can use OneHotEncoder
. For example:
In [37]: oh = preprocessing.OneHotEncoder(sparse=False)
In [38]: oh.fit([[0], [1]])
Out[38]:
OneHotEncoder(categorical_features='all', dtype=<type 'float'>,
handle_unknown='error', n_values=2, sparse=False)
In [39]: oh.transform([[1], [0]])
Out[39]:
array([[ 0., 1.],
[ 1., 0.]])
Upvotes: 1