Sklearn+Gensim: How to use Gensim's Word2Vec embedding for Sklearn text classification

Question

I am building a multilabel text classification program and I am trying to use OneVsRestClassifier+XGBClassifier to classify the text. Initially I used Sklearn's Tf-Idf Vectorization to vectorize the texts, which worked without error. Now I am using Gensim's Word2Vec to vectorize the texts. When I feed the vectorized data into the OneVsRestClassifier+XGBClassifier however, I get the following error on the line where I split the test and training data:

TypeError: Singleton array array(, dtype=object) cannot be considered a valid collection.

I have tried converting the vectorized data into a feature array (np.array), but that hasn't seemed to work. Below is my code:

x = np.array(Word2Vec(textList, size=120, window=6, min_count=5, workers=7, iter=15))

vectorizer2 = MultiLabelBinarizer()
vectorizer2.fit(tagList)
y = vectorizer2.transform(tagList)

# Split test data and convert test data to arrays
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size=0.20)

The variables textList and tagList are a list of strings (textual descriptions I am trying to classify).

Sklearn+Gensim: How to use Gensim's Word2Vec embedding for Sklearn text classification

Answers (1)

Related Questions

Sklearn+Gensim: How to use Gensim&#39;s Word2Vec embedding for Sklearn text classification

Answers (1)

Related Questions

Sklearn+Gensim: How to use Gensim's Word2Vec embedding for Sklearn text classification