fsav
fsav

Reputation: 13

Word2Vec Vocabulary not definded error

I am new to python and word2vec and keep getting a "you must first build vocabulary before training the model" error. What is wrong with my code?

Here is my code:

file_object=open("SupremeCourt.txt","w")
from gensim.models import word2vec

data = word2vec.Text8Corpus('SupremeCourt.txt')
model = word2vec.Word2Vec(data, size=200)

out=model.most_similar()

print(out[1])
print(out[2])

Upvotes: 1

Views: 297

Answers (2)

Poorna Prudhvi
Poorna Prudhvi

Reputation: 731

I could see some wrong things in your code like the file is opened in write mode and the model which you have loaded doesn't contain the word which you want to find the most similar words. I would like to suggest to use the predefined models like google_news_vectors to load in the gensim or to build your own word2vec model so that you won't get the error. the usage of most_similar in gensim is out = model.most_similar("word-name")

file_object=open("SupremeCourt.txt","r")
from gensim.models import word2vec

data = word2vec.Text8Corpus('SupremeCourt.txt')
model = word2vec.Word2Vec(data, size=200)#use google news vectors here 

out=model.most_similar("word")
print(out)

Upvotes: 1

cs95
cs95

Reputation: 402323

You're opening that file in write mode with this line:

file_object = open("SupremeCourt.txt", "w")

By doing this, you're erasing the contents of your file, so that when you try to pass the file the model for training, there is no data to read. That's why that error is thrown.

Remove that line (and also restore your file contents), and it'll work.

Upvotes: 1

Related Questions