Reputation: 568
I'm trying to set up a comparison of different pipeline steps within a single GridSearchCV run. The only example I have to go by is this one on scikit-learn, couldn't find any additional information through a web search.
I have a few questions regarding this example:
pipe = Pipeline([
('reduce_dim', PCA()),
('classify', LinearSVC())
])
N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
{
'reduce_dim': [PCA(iterated_power=7), NMF()],
'reduce_dim__n_components': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
{
'reduce_dim': [SelectKBest(chi2)],
'reduce_dim__k': N_FEATURES_OPTIONS,
'classify__C': C_OPTIONS
},
]
Here they are swapping in different reduce_dim pipeline steps by passing a list. How do you have more fine-grain control of the parameter grid if more than one function in the list have the same parameter name but you want to specify one? In the example reduce_dim__n_components is a parameter for PCA() and NMF(), in general when specifying a list of steps to swap in how do you set up the parameter grid to specify parameters for a particular function in the list? Or do you write it a different way?
Looking at the initial Pipeline() and steps declaration at the top does scikit-learn run that? So is it running three comparisons or is the initial declaration a placeholder and it is running two?
Upvotes: 1
Views: 655
Reputation: 3633
I'm not sure if I understand what you're asking, but perhaps you mean "How can I specify 'classify_C' only once despite varying the dimensionality reduction method?" One answer to that is searchgrid
which allows you to set parameters local to each estimator in a pipeline.
No it does not do anything with the initial parameters until they are modified by the parameter grid. So the initial declaration is effectively a placeholder.
Upvotes: 2