Aditya Bihani
Aditya Bihani

Reputation: 37

pipeline and cross validation in python using scikit learn

I had a general doubt for Cross Validation.

In the notebook for module 2 it is mentioned that one should use pipelines for Cross Validation in order to prevent data leakage. I understand why , however had a doubt regarding the pipeline function:

If I want to use three functions in a pipeline : MinMaxScaler(), PolynomialFeatures(for multiple degrees) and A Ridge in the end(for multiple alpha values). Since I want to find the best model after using multiple param values , I will use the GridSearchCV() function which does cross validation and gives the best model score.

However after I intialise a pipeline object with the three functions and insert it in the GridSearchCV() function , how do I insert the multiple degrees and aplha values in the params parameter of the GridSearchCV() function . Do I insert the params as a list of two lists in the order of which the functions have been defined in the pipeline object or do I send a dictionary of two lists, where the keys are the object names of the functions in the pipeline ?????

Upvotes: 2

Views: 1254

Answers (1)

Venkatachalam
Venkatachalam

Reputation: 16966

You just have to feed it as a dictionary.

Try this example:

from sklearn.preprocessing import MinMaxScaler, PolynomialFeatures
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.datasets import make_regression
from sklearn.model_selection import GridSearchCV

X, y = make_regression(random_state=42)

pipe = make_pipeline(MinMaxScaler(), PolynomialFeatures(),  Ridge())

pipe
# Pipeline(steps=[('minmaxscaler', MinMaxScaler()),
#                 ('polynomialfeatures', PolynomialFeatures()),
#                 ('ridge', Ridge())])

gs = GridSearchCV(pipe, param_grid={'polynomialfeatures__degree': [2,4],
                                    'ridge__alpha': [1,10]}).fit(X, y)

# gs.best_params_
# {'polynomialfeatures__degree': 2, 'ridge__alpha': 1}

Upvotes: 1

Related Questions