pipeline and cross validation in python using scikit learn

Question

I had a general doubt for Cross Validation.

In the notebook for module 2 it is mentioned that one should use pipelines for Cross Validation in order to prevent data leakage. I understand why , however had a doubt regarding the pipeline function:

If I want to use three functions in a pipeline : MinMaxScaler(), PolynomialFeatures(for multiple degrees) and A Ridge in the end(for multiple alpha values). Since I want to find the best model after using multiple param values , I will use the GridSearchCV() function which does cross validation and gives the best model score.

However after I intialise a pipeline object with the three functions and insert it in the GridSearchCV() function , how do I insert the multiple degrees and aplha values in the params parameter of the GridSearchCV() function . Do I insert the params as a list of two lists in the order of which the functions have been defined in the pipeline object or do I send a dictionary of two lists, where the keys are the object names of the functions in the pipeline ?????

pipeline and cross validation in python using scikit learn

Answers (1)

Related Questions