Reputation: 11
I am trying to scale both the X feature data and y output data in my sklearn pipeline. My code is as below, using grid search to calculate the optimum number of LVs using cross validation.
kfold = KFold(n_splits = 5, shuffle = False) # Kfold
pipeline = Pipeline(steps = [('preprocessor',StandardScaler()),('model',PLSRegression()]) # Pipeline
param_grid = {'model__n_components':np.arange(1,10)} # param grid for no of components
search = GridSearchCV(pipeline,param_grid, scoring = 'neg_mean_squared_error',cv = kfold, refit = True) # grid search CV using 5 fold CV, refitting best model with full dataset
search.fit(Xtrain,Ytrain) # search through grid
Upvotes: 1
Views: 2324
Reputation: 1092
In your pipeline, replace PLSRegression()
with TransformedTargetRegressor(regressor=PLSRegression(), transformer=StandardScaler())
. That should combine the target transformer into the sklearn pipeline.
Upvotes: 1
Reputation: 5164
Pipeline
objects are meant to apply a series of transformations to the features before feeding them to the final estimator along with the target values. As of now, you cannot transform the target values within such a pipeline.
At the moment, the canonical way to perform a transformation on the target for regression tasks is to use the TransformedTargetRegressor
. From the documentation:
Useful for applying a non-linear transformation to the target y in regression problems.
You can also pass the pipeline you defined in your question to a TransformedTargetRegressor
object and specify a transformation or function which should be applied to the targets y
. Here an example of how you would apply StandardScaler
:
from sklearn.compose import TransformedTargetRegressor
from sklearn.cross_decomposition import PLSRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
pipeline = Pipeline(steps = [('preprocessor', StandardScaler()),('model',PLSRegression()])
estimator = TransformedTargetRegressor(estimator=pipeline, transformer=StandardScaler())
You can then pass this estimator
object above to GridSearchCV
for finding the best hyperparameters.
Upvotes: 2