lizarisk
lizarisk

Reputation: 7820

How to tune parameters of nested Pipelines by GridSearchCV in scikit-learn?

Is it possible to tune parameters of nested Pipelines in scikit-learn? E.g.:

svm = Pipeline([
    ('chi2', SelectKBest(chi2)),
    ('cls', LinearSVC(class_weight='auto'))
])

classifier = Pipeline([
    ('vectorizer', TfIdfVectorizer()),
    ('ova_svm', OneVsRestClassifier(svm))
})

parameters = ?

GridSearchCV(classifier, parameters)

If it's not possible to do this directly, what could be a workaround?

Upvotes: 20

Views: 7263

Answers (2)

dsaj
dsaj

Reputation: 331

For the estimator that you have created you can get the list of parameters with their tags as follows.

import pprint as pp

pp.pprint(sorted(classifier.get_params().keys()))

['ova_svm', 'ova_svm__estimator', 'ova_svm__estimator__chi2', 'ova_svm__estimator__chi2__k', 'ova_svm__estimator__chi2__score_func', 'ova_svm__estimator__cls', 'ova_svm__estimator__cls__C', 'ova_svm__estimator__cls__class_weight', 'ova_svm__estimator__cls__dual', 'ova_svm__estimator__cls__fit_intercept', 'ova_svm__estimator__cls__intercept_scaling', 'ova_svm__estimator__cls__loss', 'ova_svm__estimator__cls__max_iter', 'ova_svm__estimator__cls__multi_class', 'ova_svm__estimator__cls__penalty', 'ova_svm__estimator__cls__random_state', 'ova_svm__estimator__cls__tol', 'ova_svm__estimator__cls__verbose', 'ova_svm__estimator__steps', 'ova_svm__n_jobs', 'steps', 'vectorizer', 'vectorizer__analyzer', 'vectorizer__binary', 'vectorizer__decode_error', 'vectorizer__dtype', 'vectorizer__encoding', 'vectorizer__input', 'vectorizer__lowercase', 'vectorizer__max_df', 'vectorizer__max_features', 'vectorizer__min_df', 'vectorizer__ngram_range', 'vectorizer__norm', 'vectorizer__preprocessor', 'vectorizer__smooth_idf', 'vectorizer__stop_words', 'vectorizer__strip_accents', 'vectorizer__sublinear_tf', 'vectorizer__token_pattern', 'vectorizer__tokenizer', 'vectorizer__use_idf', 'vectorizer__vocabulary']

From this list you can then set the parameters you want to do a GridSearchCV on.

Upvotes: 23

Fred Foo
Fred Foo

Reputation: 363487

scikit-learn has a double underscore notation for this, as exemplified here. It works recursively and extends to OneVsRestClassifier, with the caveat that the underlying estimator must be explicitly addressed as __estimator:

parameters = {'ova_svm__estimator__cls__C': [1, 10, 100],
              'ova_svm__estimator__chi2_k': [200, 500, 1000]}

Upvotes: 30

Related Questions