Reputation: 7820
Is it possible to tune parameters of nested Pipelines in scikit-learn? E.g.:
svm = Pipeline([
('chi2', SelectKBest(chi2)),
('cls', LinearSVC(class_weight='auto'))
])
classifier = Pipeline([
('vectorizer', TfIdfVectorizer()),
('ova_svm', OneVsRestClassifier(svm))
})
parameters = ?
GridSearchCV(classifier, parameters)
If it's not possible to do this directly, what could be a workaround?
Upvotes: 20
Views: 7263
Reputation: 331
For the estimator that you have created you can get the list of parameters with their tags as follows.
import pprint as pp
pp.pprint(sorted(classifier.get_params().keys()))
['ova_svm', 'ova_svm__estimator', 'ova_svm__estimator__chi2', 'ova_svm__estimator__chi2__k', 'ova_svm__estimator__chi2__score_func', 'ova_svm__estimator__cls', 'ova_svm__estimator__cls__C', 'ova_svm__estimator__cls__class_weight', 'ova_svm__estimator__cls__dual', 'ova_svm__estimator__cls__fit_intercept', 'ova_svm__estimator__cls__intercept_scaling', 'ova_svm__estimator__cls__loss', 'ova_svm__estimator__cls__max_iter', 'ova_svm__estimator__cls__multi_class', 'ova_svm__estimator__cls__penalty', 'ova_svm__estimator__cls__random_state', 'ova_svm__estimator__cls__tol', 'ova_svm__estimator__cls__verbose', 'ova_svm__estimator__steps', 'ova_svm__n_jobs', 'steps', 'vectorizer', 'vectorizer__analyzer', 'vectorizer__binary', 'vectorizer__decode_error', 'vectorizer__dtype', 'vectorizer__encoding', 'vectorizer__input', 'vectorizer__lowercase', 'vectorizer__max_df', 'vectorizer__max_features', 'vectorizer__min_df', 'vectorizer__ngram_range', 'vectorizer__norm', 'vectorizer__preprocessor', 'vectorizer__smooth_idf', 'vectorizer__stop_words', 'vectorizer__strip_accents', 'vectorizer__sublinear_tf', 'vectorizer__token_pattern', 'vectorizer__tokenizer', 'vectorizer__use_idf', 'vectorizer__vocabulary']
From this list you can then set the parameters you want to do a GridSearchCV on.
Upvotes: 23
Reputation: 363487
scikit-learn has a double underscore notation for this, as exemplified here. It works recursively and extends to OneVsRestClassifier
, with the caveat that the underlying estimator must be explicitly addressed as __estimator
:
parameters = {'ova_svm__estimator__cls__C': [1, 10, 100],
'ova_svm__estimator__chi2_k': [200, 500, 1000]}
Upvotes: 30