zjh1001
zjh1001

Reputation: 29

About OrdinalEncoder in Python

Some of my features have values such as excellent, good, average, bad and so on. Obviously they are in order. So I decided to use OrdinalEncoder in sklearn. I want to encode like that: excellent=0, good=1,average=2, bad=3. But I find it has encoded like this: excellent=2, good=3, bad=1, average=0.

How can I adjust the order? Does OrdinalEncoder has some parameters to control that?

Upvotes: 0

Views: 1719

Answers (1)

G__
G__

Reputation: 7111

OrdinalEncoder does not carry a specific ordering contract by default (the current source code for sklearn appears to use np.unique) to assign the ordinal to each value. You can assign the ordering yourself by passing a 2D array (features x categories) as the categories parameter to the constructor. For your case, try this:

from sklearn.preprocessing import OrdinalEncoder
# This is the actual answer:
oe = OrdinalEncoder(categories=[['excellent', 'good', 'average', 'bad']])
# Prove it
X = [['bad'], ['excellent'], ['average'], ['bad'], ['good']]
oe.fit_transform(X)

Resulting in:

array([[3.],
   [0.],
   [2.],
   [3.],
   [1.]])

The advice from the comments to create a dictionary instead is not a bad option either if you have a small number of known classes. {'excellent':0, 'good':1, 'average':2, 'bad':3} is simpler and more portable than lugging around an OrdinalEncoder object.

Upvotes: 4

Related Questions