Reputation: 4235

LabelEncoder: How to keep a dictionary that shows original and converted variable

When using LabelEncoder to encode categorical variables into numerics,

how does one keep a dictionary in which the transformation is tracked?

i.e. a dictionary in which I can see which values became what:

{'A':1,'B':2,'C':3}

Upvotes: 7

Answers (2)

Reputation:

You could do it in a single line:

le = preprocessing.LabelEncoder()
my_encodings = {l: i for (i, l) in enumerate(le.fit(data["target"].classes_))}

Upvotes: 0

Reputation: 6355

I created a dictionary from classes_

le = preprocessing.LabelEncoder()
ids = le.fit_transform(labels)
mapping = dict(zip(le.classes_, range(len(le.classes_))))

to test:

all([mapping[x] for x in le.inverse_transform(ids)] == ids)

should return True.

This works because fit_transform uses numpy.unique to simultaneously calculate the label encoding and the classes_ attribute:

def fit_transform(self, y):
    self.classes_, y = np.unique(y, return_inverse=True)
    return y

Upvotes: 16