Reputation: 17164
How to use LabelEncoder in sklearn pipeline?
NOTE The following code works for "OneHotEncoder" but fails for "LabelEncoder", How to use LabelEncoder in this circumstance?
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import make_column_transformer
import sklearn
print(sklearn.__version__) # 0.22.2.post1
df = sns.load_dataset('titanic').head()
le = OneHotEncoder() # this success
# le = LabelEncoder() # this fails
ct = make_column_transformer(
(le, ['sex','adult_male','alone']),
remainder='drop')
ct.fit_transform(df)
$$\begin{align}\mathsf P(N\mid E)&=\dfrac{\mathsf P(N\cap E)}{\mathsf P(E)}\[2ex]&=\dfrac{\mathsf P(N\cap E\mid F),\mathsf P(F)+\mathsf P(N\cap E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}{\mathsf P(E\mid F),\mathsf P(F)+\mathsf P(E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}\end{align}$$
Upvotes: 2
Views: 2499
Reputation: 1
I know this is a thread from a few years ago, but I faced this issue and found a workaround by wrapping the LabelEncoder in a custom child class of BaseEstimator and TransformerMixin (both from sklearn.base) and defined fit, transform, fit_transform and inverse_transform. Then, within the columntransformer, I pass an object of the class for each column using a simple list comprehension.
For example, you could go for a custom class as follows:
from sklearn.base import BaseEstimator, TransformerMixin
class CustomLabelEncoder(BaseEstimator, TransformerMixin):
def __init__(self):
self.le = LabelEncoder()
def fit(self, X, y=None):
self.le.fit(X)
return self
def transform(self, X):
return self.le.transform(X).reshape(-1, 1)
def fit_transform(self, X, y=None):
return self.fit(X).transform(X)
def inverse_transform(self, X_encoded):
return self.le.inverse_transform(X_encoded.ravel())
ct=ColumnTransformer([(f'enc_{col}',CustomLabelEncoder(),col) for col in ['sex','adult_male','alone']],remainder='drop)
ct.fit_transform(df)
Hope this helps :))
Upvotes: 0
Reputation: 1806
LabelEncoder
was specially designed for encoding the target variable - y
. That's why you can't use it to transform multiple columns at the same time as with OneHotEncoder
.
Sklearn provides OrdinalEncoder
for such circumstances. It can encode multiple columns at once when encoding features.
Upvotes: 1
Reputation: 108
From the docs, OneHotEncoder
can take a dataframe and convert the categorical columns into the vectors you see. LabelEncoder
takes a Series(your y / dependent variable) and generates new labels.
OnHotEncoder's usage: fit_transform(X,[y])
LabelEncoder's usage: fit_transform(y)
That's why it'll tell you: "fit_transform() takes 2 positional arguments but 3 were given
"
Just call LabelEncoder
fit_transform
on the y directly if you really want to use it. Here is a similar question: How to use sklearn Column Transformer?
Here are the docs:
Upvotes: 2