Reputation: 169
I'm attempting to compare a tagged document consisting of a list of words to individual tags from a list of tags.
My code is as follows:
from gensim.models.doc2vec import Doc2Vec
from gensim import similarities,corpora,models
import Load
documents = Load.get_doc('docs')
data = Doc2Vec.load('vectorised.model')
print('Data Loading finished')
tags = [['word1'],['word2'],['word3'],['word4'],['word5']]
tag_vectors = []
data.n_similarity(tags[0],documents[1])
The issue i'm having is running:
data.n_similarity(tags[0],documents[1])
feeds back KeyError: "word 'otherword' not in vocabulary
I want to get the similarity between the taggeddocument and the tag itself, so my question is what do I need to change in my code so it checks correctly and gives back a similarity value?
n.b. I've replaced the actual words here with placeholders
Upvotes: 1
Views: 453
Reputation: 26
I think you should check if the "word"(KeyError one) is in the 'vectorised.model' if the model do not have the word you can do some incremental training like
model = Doc2Vec.load(your old model)
model.build_vocab(text, update=True) # update your vocab
model.train
Upvotes: 1