gprinz
gprinz

Reputation: 95

Save a scikit-learn pipeline to a file

How can I save a scikit-learn pipeline that has been trained to a local file? The official documentation says the following: https://scikit-learn.org/stable/modules/model_persistence.html

But when trying to save a pipeline, I get an error. Example:

estimators = [
    ('tfidf', TfidfVectorizer(tokenizer=lambda string: string.split(),
                             min_df=20, 
                             max_df=0.75,
                             ngram_range=(1,1))),
    ('clf', RandomForestClassifier(n_estimators=100,
                                   n_jobs=-1, 
                                   class_weight='balanced'))
]

p = Pipeline(estimators)
p.fit(x_train, y_train)

model = 'model.joblib'
joblib.dump(p, model)

However, I get the error message 'PicklingError: Can't pickle at 0x7f4c9f1e50d0>: it's not found as main.'.

How can I solve this problem?

Upvotes: 0

Views: 2694

Answers (1)

gprinz
gprinz

Reputation: 95

Sorry, guys for asking this question. I have found the solution:

Not everything can be pickled (easily), though: examples of this are generators, inner classes, lambda functions and defaultdicts. In the case of lambda functions, you need to use an additional package named dill. With defaultdicts, you need to create them with a module-level function.

Source: https://www.datacamp.com/community/tutorials/pickle-python-tutorial

Upvotes: 1

Related Questions