Reputation: 10214
I am using gensim's Doc2vec to learn features from news articles. I can successfully train my documents. However, I struggle to retrieve the document vectors from the model for further processing.
Example code (directly taken from gensim's documentation):
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.test.utils import common_texts
documents = [TaggedDocument((doc), [i]) for i, doc in enumerate(common_texts)]
model = Doc2Vec(documents, vector_size=5, window=2, min_count=1, workers=4)
This correctly trains without error.
If I try to use model.docvecs
directly or iterate over it like so:
for vector in model.docvecs:
print(vector)
I get this error:
KeyError: "tag '9' not seen in training corpus/invalid"
What is the reason for this and how can I fix this?
Upvotes: 2
Views: 1953
Reputation: 10214
Solved it...
I need to use
model.docvecs.doctag_syn0
(soon to be deprecated)
or
model.docvecs.vectors_docs
Upvotes: 2