Gensim n_similarity word not in vocabulary

Question

I'm attempting to compare a tagged document consisting of a list of words to individual tags from a list of tags.

My code is as follows:

from gensim.models.doc2vec import Doc2Vec
from gensim import similarities,corpora,models
import Load

documents = Load.get_doc('docs')

data = Doc2Vec.load('vectorised.model')

print('Data Loading finished')

tags = [['word1'],['word2'],['word3'],['word4'],['word5']]

tag_vectors = []

data.n_similarity(tags[0],documents[1])

The issue i'm having is running:

data.n_similarity(tags[0],documents[1])

feeds back KeyError: "word 'otherword' not in vocabulary

I want to get the similarity between the taggeddocument and the tag itself, so my question is what do I need to change in my code so it checks correctly and gives back a similarity value?

n.b. I've replaced the actual words here with placeholders

Youyizuopig · Accepted Answer

I think you should check if the "word"(KeyError one) is in the 'vectorised.model' if the model do not have the word you can do some incremental training like

model = Doc2Vec.load(your old model)
model.build_vocab(text, update=True) # update your vocab 
model.train

Gensim n_similarity word not in vocabulary

Answers (1)

Related Questions