Reputation: 17164

How to use LabelEncoder in sklearn make_column_tranformer?

How to use LabelEncoder in sklearn pipeline?

NOTE The following code works for "OneHotEncoder" but fails for "LabelEncoder", How to use LabelEncoder in this circumstance?

MWE

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import make_column_transformer
import sklearn

print(sklearn.__version__) # 0.22.2.post1

df = sns.load_dataset('titanic').head()

le = OneHotEncoder() # this success
# le = LabelEncoder() # this fails

ct = make_column_transformer(
    (le, ['sex','adult_male','alone']),
    remainder='drop')

ct.fit_transform(df)

$$\begin{align}\mathsf P(N\mid E)&=\dfrac{\mathsf P(N\cap E)}{\mathsf P(E)}\[2ex]&=\dfrac{\mathsf P(N\cap E\mid F),\mathsf P(F)+\mathsf P(N\cap E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}{\mathsf P(E\mid F),\mathsf P(F)+\mathsf P(E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}\end{align}$$

Upvotes: 2

Answers (3)

Lee Johns

Reputation: 1

I know this is a thread from a few years ago, but I faced this issue and found a workaround by wrapping the LabelEncoder in a custom child class of BaseEstimator and TransformerMixin (both from sklearn.base) and defined fit, transform, fit_transform and inverse_transform. Then, within the columntransformer, I pass an object of the class for each column using a simple list comprehension.

For example, you could go for a custom class as follows:

from sklearn.base import BaseEstimator, TransformerMixin
class CustomLabelEncoder(BaseEstimator, TransformerMixin):
    def __init__(self):
        self.le = LabelEncoder()

    def fit(self, X, y=None):
        self.le.fit(X)
        return self

    def transform(self, X):
        return self.le.transform(X).reshape(-1, 1)

    def fit_transform(self, X, y=None):
        return self.fit(X).transform(X)

    def inverse_transform(self, X_encoded):
        return self.le.inverse_transform(X_encoded.ravel())

    ct=ColumnTransformer([(f'enc_{col}',CustomLabelEncoder(),col) for col in ['sex','adult_male','alone']],remainder='drop)
    
    ct.fit_transform(df)

Hope this helps :))

Upvotes: 0

Bex T.

Reputation: 1806

LabelEncoder was specially designed for encoding the target variable - y. That's why you can't use it to transform multiple columns at the same time as with OneHotEncoder.

Sklearn provides OrdinalEncoder for such circumstances. It can encode multiple columns at once when encoding features.

Upvotes: 1

joesph nguyen

Reputation: 108

From the docs, OneHotEncoder can take a dataframe and convert the categorical columns into the vectors you see. LabelEncoder takes a Series(your y / dependent variable) and generates new labels.

OnHotEncoder's usage: fit_transform(X,[y])

LabelEncoder's usage: fit_transform(y)

That's why it'll tell you: "fit_transform() takes 2 positional arguments but 3 were given"

Just call LabelEncoder fit_transform on the y directly if you really want to use it. Here is a similar question: How to use sklearn Column Transformer?

Here are the docs:

Upvotes: 2

How to use LabelEncoder in sklearn make_column_tranformer?

MWE

Answers (3)

Related Questions