Reputation: 45
I have used CountVectorizer() on following data It was not considering single letter words so I used lambda in vectorizer
data = ['E', 'C', 'Employee', 'Child']
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(tokenizer=lambda txt: txt.split())
X = vectorizer.fit_transform(data)
But when I am trying to dump model in joblib file
dump({'RandomForestClassifier':rfc, 'vectorizer':vectorizer}, 'model_rfc.joblib', compress=1)
its giving picklingError
PicklingError : Can't pickle <function> <lambda> at 0x0000026C13C94430> : it's not found as __main__.<lambda>
How can I solve it?
Upvotes: 1
Views: 621
Reputation: 653
You are using a lambda function in your CountVectorizer
and lambdas can't be pickled, hence the pickling error. To circumvent this, define your tokenization function prior to initializing the vectorizer. For example, define your tokenization function:
def tokenizer(txt):
""" whitespace tokenization """
return txt.split()
And pass it as a parameter:
vectorizer = CountVectorizer(tokenizer=tokenizer)
You should be able pickle without any issues after following these steps.
Upvotes: 1