creating a pipeline for onehotencoded variables not working

Question

i have a problem where i am trying to apply transformations to my catgeorical feature 'country' and the rest of my numerical columns. how can i do this as i am trying below:

preprocess = make_column_transformer(
    (numeric_cols, make_pipeline(MinMaxScaler())),
    (categorical_cols, OneHotEncoder()))

model = make_pipeline(preprocess,XGBClassifier())

model.fit(X_train, y_train)

note that numeric_cols is passed as a list and so is categorical_cols.

however i get this error: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. along with a list of all my numerical columns (type ) doesn't.

what am i doing wrong, also how can i deal with unseen categories in column country?

StupidWolf · Accepted Answer

You need to put the transform function first, then the columns as subsequent arguments, if you check out the help page, it writes:

sklearn.compose.make_column_transformer(*transformers, **kwargs)

Some like below will work:

from sklearn.preprocessing import StandardScaler, OneHotEncoder,MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline

from xgboost import XGBClassifier

import numpy as np
import pandas as pd

X = pd.DataFrame({'x1':np.random.uniform(0,1,5),
                   'x2':np.random.choice(['A','B'],5)})

y = pd.Series(np.random.choice(['0','1'],5))
 
numeric_cols = X.select_dtypes('number').columns.to_list()
categorical_cols = X.select_dtypes('object').columns.to_list()
    
preprocess = make_column_transformer(
    (MinMaxScaler(),numeric_cols),
    (OneHotEncoder(),categorical_cols)
    )

model = make_pipeline(preprocess,XGBClassifier())
model.fit(X,y)

Pipeline(steps=[('columntransformer',
                 ColumnTransformer(transformers=[('minmaxscaler',
                                                  MinMaxScaler(), ['x1']),
                                                 ('onehotencoder',
                                                  OneHotEncoder(), ['x2'])])),
                ('xgbclassifier', XGBClassifier())])

creating a pipeline for onehotencoded variables not working

Answers (1)

Related Questions