Reputation: 29
Some of my features have values such as excellent, good, average, bad and so on. Obviously they are in order. So I decided to use OrdinalEncoder
in sklearn. I want to encode like that: excellent=0, good=1,average=2, bad=3. But I find it has encoded like this: excellent=2, good=3, bad=1, average=0.
How can I adjust the order? Does OrdinalEncoder
has some parameters to control that?
Upvotes: 0
Views: 1719
Reputation: 7111
OrdinalEncoder does not carry a specific ordering contract by default (the current source code for sklearn appears to use np.unique) to assign the ordinal to each value. You can assign the ordering yourself by passing a 2D array (features x categories) as the categories
parameter to the constructor. For your case, try this:
from sklearn.preprocessing import OrdinalEncoder
# This is the actual answer:
oe = OrdinalEncoder(categories=[['excellent', 'good', 'average', 'bad']])
# Prove it
X = [['bad'], ['excellent'], ['average'], ['bad'], ['good']]
oe.fit_transform(X)
Resulting in:
array([[3.],
[0.],
[2.],
[3.],
[1.]])
The advice from the comments to create a dictionary instead is not a bad option either if you have a small number of known classes. {'excellent':0, 'good':1, 'average':2, 'bad':3}
is simpler and more portable than lugging around an OrdinalEncoder object.
Upvotes: 4