Reputation: 95
How can I save a scikit-learn pipeline that has been trained to a local file? The official documentation says the following: https://scikit-learn.org/stable/modules/model_persistence.html
But when trying to save a pipeline, I get an error. Example:
estimators = [
('tfidf', TfidfVectorizer(tokenizer=lambda string: string.split(),
min_df=20,
max_df=0.75,
ngram_range=(1,1))),
('clf', RandomForestClassifier(n_estimators=100,
n_jobs=-1,
class_weight='balanced'))
]
p = Pipeline(estimators)
p.fit(x_train, y_train)
model = 'model.joblib'
joblib.dump(p, model)
However, I get the error message 'PicklingError: Can't pickle at 0x7f4c9f1e50d0>: it's not found as main.'.
How can I solve this problem?
Upvotes: 0
Views: 2694
Reputation: 95
Sorry, guys for asking this question. I have found the solution:
Not everything can be pickled (easily), though: examples of this are generators, inner classes, lambda functions and defaultdicts. In the case of lambda functions, you need to use an additional package named dill. With defaultdicts, you need to create them with a module-level function.
Source: https://www.datacamp.com/community/tutorials/pickle-python-tutorial
Upvotes: 1