John Sall
John Sall

Reputation: 1151

I get isnan error when I merge two countvectorizers

I'm going dialect text classification and I have this code:

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

vectorizerN = CountVectorizer(analyzer='char',ngram_range=(3,4))
XN = vectorizerN.fit_transform(X_train)

vectorizerMX = CountVectorizer(vocabulary=a['vocabs'])
MX = vectorizerMX.fit_transform(X_train)

from sklearn.pipeline import FeatureUnion
combined_features = FeatureUnion([('CountVectorizer', MX),('CountVect', XN)])
combined_features.transform(test_data)

When I run this code I get this error:

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I was following the code in this post: Merging CountVectorizer in Scikit-Learn feature extraction

Also, how can I train and predict afterwards?

Upvotes: 0

Views: 91

Answers (1)

Aris F.
Aris F.

Reputation: 1117

You should union vectorizerN and vectorizerMX, not MX and XN. Change the line to

combined_features = FeatureUnion([('CountVectorizer', vectorizerMX), ('CountVect', vectorizerN)])

Upvotes: 1

Related Questions