Reputation: 437
I have a text preprocessing Pipeline:
pipe = Pipeline([
('count_vectorizer', CountVectorizer()),
('chi2score', SelectKBest(chi2, k=1000)),
('tfidf_transformer', TfidfTransformer(norm='l2', use_idf=True)),
])
and I want to perform cross validation on a pipeline with multiple estimators. This is a solution that is working, but honestly I don't really like it. There should be a better way to do it. Maybe somehow convert the Pipeline to a transformer?
pipe_nb = Pipeline([*pipe.steps, ('naive_bayes', MultinomialNB())])
That's an approach that I perceive as an ideal one, but unfortunately it does not merge steps into new pipeline and causes issues.
pipe_nb = make_pipeline(
pipe,
MultinomialNB()
)
How to merge two pipelines into one, in a nice pythonic way?
Upvotes: 2
Views: 1804
Reputation: 120559
Why do not simply append a new step into pipe.steps
rather than to recreate a new one?
pipe.steps.append(('naive_bayes', MultinomialNB()))
print(pipe)
# Output
Pipeline(steps=[('count_vectorizer', CountVectorizer()),
('chi2score', SelectKBest(k=1000, score_func=1)),
('tfidf_transformer', TfidfTransformer()),
('naive_bayes', MultinomialNB())])
Or:
# Don't forget the *
pipe2 = make_pipeline(*pipe, MultinomialNB())
print(pipe2)
# Output
Pipeline(steps=[('countvectorizer', CountVectorizer()),
('selectkbest', SelectKBest(k=1000, score_func=1)),
('tfidftransformer', TfidfTransformer()),
('multinomialnb', MultinomialNB())])
Upvotes: 3