Filip Szczybura
Filip Szczybura

Reputation: 437

Combine two sklearn pipelines into one

I have a text preprocessing Pipeline:

pipe = Pipeline([
  ('count_vectorizer', CountVectorizer()),
  ('chi2score', SelectKBest(chi2, k=1000)),
  ('tfidf_transformer', TfidfTransformer(norm='l2', use_idf=True)),
])

and I want to perform cross validation on a pipeline with multiple estimators. This is a solution that is working, but honestly I don't really like it. There should be a better way to do it. Maybe somehow convert the Pipeline to a transformer?

pipe_nb = Pipeline([*pipe.steps, ('naive_bayes', MultinomialNB())])

That's an approach that I perceive as an ideal one, but unfortunately it does not merge steps into new pipeline and causes issues.

pipe_nb = make_pipeline(
  pipe, 
  MultinomialNB()
)

How to merge two pipelines into one, in a nice pythonic way?

Upvotes: 2

Views: 1804

Answers (1)

Corralien
Corralien

Reputation: 120559

Why do not simply append a new step into pipe.steps rather than to recreate a new one?

pipe.steps.append(('naive_bayes', MultinomialNB()))
print(pipe)

# Output
Pipeline(steps=[('count_vectorizer', CountVectorizer()),
                ('chi2score', SelectKBest(k=1000, score_func=1)),
                ('tfidf_transformer', TfidfTransformer()),
                ('naive_bayes', MultinomialNB())])

Or:

# Don't forget the *
pipe2 = make_pipeline(*pipe, MultinomialNB())
print(pipe2)

# Output
Pipeline(steps=[('countvectorizer', CountVectorizer()),
                ('selectkbest', SelectKBest(k=1000, score_func=1)),
                ('tfidftransformer', TfidfTransformer()),
                ('multinomialnb', MultinomialNB())])

Upvotes: 3

Related Questions