Jernej
Jernej

Reputation: 352

OneHotEncoder raising NaN issue after SimpleImputer has been called already

I have trouble understanding how pipelines are supposed to work in Sklearn. Following is an example using the titanic dataset.

data = pd.read_csv('datasets/train.csv')

cat_attribs = ["Embarked", "Cabin", "Ticket", "Name"]

num_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="median")),
    ])


str_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="most_frequent")),
    ])


full_pipeline = ColumnTransformer([
        ("num", num_pipeline, ["Pclass", "Age", "SibSp", "Parch", "Fare"]),
        ("str", str_pipeline, ["Cabin", "Sex"]),
        ("cat", OneHotEncoder(), ["Cabin"]),
    ])

full_pipeline.fit_transform(data)

I'd expect this to fill all missing NaN values (both in numeric and string) attributes, and then finally transform the Cabin attribute into a numerical one.

Instead the code ends up with the following error:

ValueError: Input contains NaN. If I remove the line calling the OneHotEncoder and printing the transformed array, there is no NaN value.

Hence I wonder. How am I supposed to call OneHotEncoder in this situation.

Upvotes: 3

Views: 2370

Answers (1)

Venkatachalam
Venkatachalam

Reputation: 16966

I would recommend applying OneHotEncoder to all categorical variables. Hence make that as a seperate pipeline.

As it's a single step process for numerical columns, you can use the ColumnTransformer directly.

Try this!

from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline, make_pipeline

cat_preprocess = make_pipeline(SimpleImputer(strategy="most_frequent"), OneHotEncoder())

ct = make_column_transformer([
        ("num", SimpleImputer(strategy="median"), ["Pclass", "Age", "SibSp", "Parch", "Fare"]),
        ("str", cat_preprocess, ["Cabin", "Sex"]),
    ])

pipeline = Pipeline([('preprocess', ct)])

Upvotes: 2

Related Questions