Reputation: 17154
How to use LabelEncoder in sklearn pipeline?
NOTE The following code works for "OneHotEncoder" but fails for "LabelEncoder", How to use LabelEncoder in this circumstance?
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import make_column_transformer
import sklearn
print(sklearn.__version__) # 0.22.2.post1
df = sns.load_dataset('titanic').head()
le = OneHotEncoder() # this success
# le = LabelEncoder() # this fails
ct = make_column_transformer(
(le, ['sex','adult_male','alone']),
remainder='drop')
ct.fit_transform(df)
$$\begin{align}\mathsf P(N\mid E)&=\dfrac{\mathsf P(N\cap E)}{\mathsf P(E)}\[2ex]&=\dfrac{\mathsf P(N\cap E\mid F),\mathsf P(F)+\mathsf P(N\cap E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}{\mathsf P(E\mid F),\mathsf P(F)+\mathsf P(E\mid F^{\small\complement}),\mathsf P(F^{\small\complement})}\end{align}$$
Upvotes: 2
Views: 2411
Reputation: 1786
LabelEncoder
was specially designed for encoding the target variable - y
. That's why you can't use it to transform multiple columns at the same time as with OneHotEncoder
.
Sklearn provides OrdinalEncoder
for such circumstances. It can encode multiple columns at once when encoding features.
Upvotes: 1
Reputation: 108
From the docs, OneHotEncoder
can take a dataframe and convert the categorical columns into the vectors you see. LabelEncoder
takes a Series(your y / dependent variable) and generates new labels.
OnHotEncoder's usage: fit_transform(X,[y])
LabelEncoder's usage: fit_transform(y)
That's why it'll tell you: "fit_transform() takes 2 positional arguments but 3 were given
"
Just call LabelEncoder
fit_transform
on the y directly if you really want to use it. Here is a similar question: How to use sklearn Column Transformer?
Here are the docs:
Upvotes: 2