User2603
User2603

Reputation: 45

PicklingError : Can't pickle <function> <lambda> at 0x0000026C13C94430> : it's not found as __main__.<lambda>

I have used CountVectorizer() on following data It was not considering single letter words so I used lambda in vectorizer

data = ['E', 'C', 'Employee', 'Child']

from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(tokenizer=lambda txt: txt.split())
X = vectorizer.fit_transform(data)

But when I am trying to dump model in joblib file

dump({'RandomForestClassifier':rfc, 'vectorizer':vectorizer}, 'model_rfc.joblib', compress=1) 

its giving picklingError

PicklingError : Can't pickle <function> <lambda> at 0x0000026C13C94430> : it's not found as __main__.<lambda>

How can I solve it?

Upvotes: 1

Views: 621

Answers (1)

A.T.B
A.T.B

Reputation: 653

You are using a lambda function in your CountVectorizer and lambdas can't be pickled, hence the pickling error. To circumvent this, define your tokenization function prior to initializing the vectorizer. For example, define your tokenization function:

def tokenizer(txt): 
    
    """ whitespace tokenization """
    
    return txt.split()

And pass it as a parameter:

vectorizer = CountVectorizer(tokenizer=tokenizer)

You should be able pickle without any issues after following these steps.

Upvotes: 1

Related Questions