hermidalc
hermidalc

Reputation: 568

Confusion on how to set parameter grid when comparing multiple pipeline steps in a single GridSearchCV run

I'm trying to set up a comparison of different pipeline steps within a single GridSearchCV run. The only example I have to go by is this one on scikit-learn, couldn't find any additional information through a web search.

http://scikit-learn.org/stable/auto_examples/plot_compare_reduction.html#sphx-glr-auto-examples-plot-compare-reduction-py

I have a few questions regarding this example:

pipe = Pipeline([
    ('reduce_dim', PCA()),
    ('classify', LinearSVC())
])

N_FEATURES_OPTIONS = [2, 4, 8]
C_OPTIONS = [1, 10, 100, 1000]
param_grid = [
    {
        'reduce_dim': [PCA(iterated_power=7), NMF()],
        'reduce_dim__n_components': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
    {
        'reduce_dim': [SelectKBest(chi2)],
        'reduce_dim__k': N_FEATURES_OPTIONS,
        'classify__C': C_OPTIONS
    },
]
  1. Here they are swapping in different reduce_dim pipeline steps by passing a list. How do you have more fine-grain control of the parameter grid if more than one function in the list have the same parameter name but you want to specify one? In the example reduce_dim__n_components is a parameter for PCA() and NMF(), in general when specifying a list of steps to swap in how do you set up the parameter grid to specify parameters for a particular function in the list? Or do you write it a different way?

  2. Looking at the initial Pipeline() and steps declaration at the top does scikit-learn run that? So is it running three comparisons or is the initial declaration a placeholder and it is running two?

Upvotes: 1

Views: 655

Answers (1)

joeln
joeln

Reputation: 3633

  1. I'm not sure if I understand what you're asking, but perhaps you mean "How can I specify 'classify_C' only once despite varying the dimensionality reduction method?" One answer to that is searchgrid which allows you to set parameters local to each estimator in a pipeline.

  2. No it does not do anything with the initial parameters until they are modified by the parameter grid. So the initial declaration is effectively a placeholder.

Upvotes: 2

Related Questions