OlavT
OlavT

Reputation: 2666

How to one-hot encode column with two values?

If I one-hot encode a column with 3 possible values like this:

from sklearn import preprocessing
lb = preprocessing.LabelBinarizer()
lb.fit([0, 1, 2])
lb.classes_
lb.transform([1, 0])

Then I get:

array([[0, 1, 0],
       [1, 0, 0]])

which is exactly what I would like. 3 columns = 1 column for each possible value.

But, if I have 2 possible values like this:

lb.fit([0, 1])
lb.classes_
lb.transform([1, 0])

I get:

array([[1],
       [0]])

which is only 1 column, even if I have 2 possible values. What I would like to end up in this case is:

array([[0, 1],
       [1, 0]])

How can I get the 2 column result in this case?

Upvotes: 2

Views: 1823

Answers (2)

OlavT
OlavT

Reputation: 2666

It looks like pandas.get_dummies is the easiest solution in my case:

pd.get_dummies([1, 0])

Upvotes: 0

mbednarski
mbednarski

Reputation: 798

You can use OneHotEncoder. For example:

In [37]: oh = preprocessing.OneHotEncoder(sparse=False)

In [38]: oh.fit([[0], [1]])
Out[38]:
OneHotEncoder(categorical_features='all', dtype=<type 'float'>,
       handle_unknown='error', n_values=2, sparse=False)

In [39]: oh.transform([[1], [0]])
Out[39]:
array([[ 0.,  1.],
       [ 1.,  0.]])

Upvotes: 1

Related Questions