Reputation: 4235
When using LabelEncoder
to encode categorical variables into numerics,
how does one keep a dictionary in which the transformation is tracked?
i.e. a dictionary in which I can see which values became what:
{'A':1,'B':2,'C':3}
Upvotes: 7
Views: 12815
Reputation:
You could do it in a single line:
le = preprocessing.LabelEncoder()
my_encodings = {l: i for (i, l) in enumerate(le.fit(data["target"].classes_))}
Upvotes: 0
Reputation: 6355
I created a dictionary from classes_
le = preprocessing.LabelEncoder()
ids = le.fit_transform(labels)
mapping = dict(zip(le.classes_, range(len(le.classes_))))
to test:
all([mapping[x] for x in le.inverse_transform(ids)] == ids)
should return True
.
This works because fit_transform
uses numpy.unique
to simultaneously calculate the label encoding and the classes_
attribute:
def fit_transform(self, y):
self.classes_, y = np.unique(y, return_inverse=True)
return y
Upvotes: 16