carguerriero
carguerriero

Reputation: 141

How to reverse Label Encoder from sklearn for multiple columns?

I would like to use the inverse_transform function for LabelEncoder on multiple columns.

This is the code I use for more than one columns when applying LabelEncoder on a dataframe:

class MultiColumnLabelEncoder:
    def __init__(self,columns = None):
        self.columns = columns # array of column names to encode

    def fit(self,X,y=None):
        return self # not relevant here

    def transform(self,X):
        '''
        Transforms columns of X specified in self.columns using
        LabelEncoder(). If no columns specified, transforms all
        columns in X.
        '''
        output = X.copy()
        if self.columns is not None:
            for col in self.columns:
                output[col] = LabelEncoder().fit_transform(output[col])
        else:
            for colname,col in output.iteritems():
                output[colname] = LabelEncoder().fit_transform(col)
        return output

    def fit_transform(self,X,y=None):
        return self.fit(X,y).transform(X)

Is there a way to modify the code and change it so that it be used to inverse the labels from the encoder?

Thanks

Upvotes: 6

Views: 19548

Answers (3)

Bex T.
Bex T.

Reputation: 1786

LabelEncoder() should only be used to encode the target. That's why you can't use it on multiple columns at the same time as any other transformers. The alternative is the OrdinalEncoder which does the same job as LabelEncoder but can be used on all categorical columns at the same time just like OneHotEncoder:

from sklear.preprocessing import OrdinalEncoder

oe = OrdinalEncoder()
X = or.fit_transform(X)

Upvotes: 1

seralouk
seralouk

Reputation: 33147

You do not need to modify it this way. It's already implemented as a method inverse_transform.

Example:

from sklearn import preprocessing

le = preprocessing.LabelEncoder()
df = ["paris", "paris", "tokyo", "amsterdam"]

le_fitted = le.fit_transform(df)

inverted = le.inverse_transform(le_fitted)

print(inverted)
# array(['paris', 'paris', 'tokyo', 'amsterdam'], dtype='|S9')

Upvotes: 4

gereleth
gereleth

Reputation: 2482

In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoders in a dict inside your object. The way it would work:

  • when you call fit the encoders for every column are fit and saved
  • when you call transform they get used to transform data
  • when you call inverse_transform they get used to do the inverse transformation

Example code:

class MultiColumnLabelEncoder:

    def __init__(self, columns=None):
        self.columns = columns # array of column names to encode


    def fit(self, X, y=None):
        self.encoders = {}
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            self.encoders[col] = LabelEncoder().fit(X[col])
        return self


    def transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].transform(X[col])
        return output


    def fit_transform(self, X, y=None):
        return self.fit(X,y).transform(X)


    def inverse_transform(self, X):
        output = X.copy()
        columns = X.columns if self.columns is None else self.columns
        for col in columns:
            output[col] = self.encoders[col].inverse_transform(X[col])
        return output

You can then use it like this:

multi = MultiColumnLabelEncoder(columns=['city','size'])
df = pd.DataFrame({'city':    ['London','Paris','Moscow'],
                   'size':    ['M',     'M',    'L'],
                   'quantity':[12,       1,      4]})
X = multi.fit_transform(df)
print(X)
#    city  size  quantity
# 0     0     1        12
# 1     2     1         1
# 2     1     0         4
inv = multi.inverse_transform(X)
print(inv)
#      city size  quantity
# 0  London    M        12
# 1   Paris    M         1
# 2  Moscow    L         4

There could be a separate implementation of fit_transform that would call the same method of LabelEncoders. Just make sure to keep the encoders around for when you need the inverse transformation.

Upvotes: 10

Related Questions