Reputation: 81
i have a problem where i am trying to apply transformations to my catgeorical feature 'country' and the rest of my numerical columns. how can i do this as i am trying below:
preprocess = make_column_transformer(
(numeric_cols, make_pipeline(MinMaxScaler())),
(categorical_cols, OneHotEncoder()))
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X_train, y_train)
note that numeric_cols is passed as a list and so is categorical_cols.
however i get this error: TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers.
along with a list of all my numerical columns (type <class 'list'>) doesn't.
what am i doing wrong, also how can i deal with unseen categories in column country?
Upvotes: 1
Views: 195
Reputation: 47008
You need to put the transform function first, then the columns as subsequent arguments, if you check out the help page, it writes:
sklearn.compose.make_column_transformer(*transformers, **kwargs)
Some like below will work:
from sklearn.preprocessing import StandardScaler, OneHotEncoder,MinMaxScaler
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
from xgboost import XGBClassifier
import numpy as np
import pandas as pd
X = pd.DataFrame({'x1':np.random.uniform(0,1,5),
'x2':np.random.choice(['A','B'],5)})
y = pd.Series(np.random.choice(['0','1'],5))
numeric_cols = X.select_dtypes('number').columns.to_list()
categorical_cols = X.select_dtypes('object').columns.to_list()
preprocess = make_column_transformer(
(MinMaxScaler(),numeric_cols),
(OneHotEncoder(),categorical_cols)
)
model = make_pipeline(preprocess,XGBClassifier())
model.fit(X,y)
Pipeline(steps=[('columntransformer',
ColumnTransformer(transformers=[('minmaxscaler',
MinMaxScaler(), ['x1']),
('onehotencoder',
OneHotEncoder(), ['x2'])])),
('xgbclassifier', XGBClassifier())])
Upvotes: 1