Reputation: 107
how can I actually proceed it in a single pipeline, is there any value missing or wrongly defined something.
#instantiate
imputer = SimpleImputer()
ohe = OneHotEncoder(use_cat_names=True)
#fit
imputer.fit(X_train)
ohe.fit(X_train)
#transform
XT_train = imputer.transform(X_train["lat","lon"])
XT_train = ohe.transform(X_train["neighborhood"])
model = make_pipeline(
SimpleImputer(),
OneHotEncoder(use_cat_names=True),
Ridge()
)
model.fit(X_train, y_train)
Error I found in the console like
Upvotes: 0
Views: 323
Reputation: 33
Yes its magical under the hood, but OHE has to come before SimpleImputer. If you start the pipeline with SimpleImputer, you get the error as you did, I did that too. But changing the order solved the issue, and here is the pipeline
Pipeline
OneHotEncoder
OneHotEncoder(cols=['neighborhood'], use_cat_names=True)
SimpleImputer
SimpleImputer()
Ridge
Ridge()
Upvotes: 0
Reputation: 107
#instantiate
imputer = SimpleImputer()
ohe = OneHotEncoder(use_cat_names=True)
#fit
imputer.fit(X_train)
ohe.fit(X_train)
#transform
XT_train = imputer.transform(X_train["lat","lon"])
XT_train = ohe.transform(X_train["neighborhood"])
Remove All above lines of code. Because, OneHotEncoder Automatically detect categorical data in the feature matrix, like this is true for also SimpleImputer ->> it can identify numerical NAN values and then fill it.
# Build Model
model = make_pipeline(
OneHotEncoder(use_cat_names=True),
SimpleImputer(),
Ridge()
)
# Fit model
model.fit(X_train, y_train)
Upvotes: 1