Reputation: 21
I am having trouble encoding only categorical columns using OneHotEncoder and leaving out continuous columns. The encoder encodes all columns no matter what I specify in the categorical_features. For example:
enc = preprocessing.OneHotEncoder()
enc.fit([[0, 40, 3], [1, 50, 0], [0, 45, 1], [1, 30, 2]])
OneHotEncoder(categorical_features=[0,2],
handle_unknown='error', n_values='auto', sparse=True)
print enc.n_values_
print enc.feature_indices_
enc.transform([[0, 45, 3]]).toarray()
I only want to encode column 1 and 3, leaving the middle column (values 40, 50, 45, 30) as continuous values. So I specify categorical_features=[0,2], but no matter what I do, the output of this code is still:
[ 2 51 4]
[ 0 2 53 57]
Out[129]:
array([[ 1., 0., 0., 0., 1., 0., 0., 0., 0., 1.]])
Upvotes: 1
Views: 10842
Reputation: 156
Why do you call OneHotEncoder
constructor twise? enc
has been created by default constructor, so for enc
you have categorical_features='all'
(all feature are categorical). As I understand you need somthing like this:
enc = OneHotEncoder(categorical_features=[0,2],
handle_unknown='error', n_values='auto', sparse=True)
enc.fit([[0, 40, 3], [1, 50, 0], [0, 45, 1], [1, 30, 2]])
print(enc.n_values_)
print(enc.feature_indices_)
enc.transform([[0, 45, 3]]).toarray()
and you will have
[2 4]
[0 2 6]
Out[23]: array([[ 1., 0., 0., 0., 0., 1., 45.]])
Upvotes: 2