petezurich
petezurich

Reputation: 10214

Gensim Doc2vec – KeyError: "tag not seen in training corpus/invalid"

I am using gensim's Doc2vec to learn features from news articles. I can successfully train my documents. However, I struggle to retrieve the document vectors from the model for further processing.

Example code (directly taken from gensim's documentation):

from gensim.models.doc2vec import Doc2Vec, TaggedDocument
from gensim.test.utils import common_texts

documents = [TaggedDocument((doc), [i]) for i, doc in enumerate(common_texts)]
model = Doc2Vec(documents, vector_size=5, window=2, min_count=1, workers=4)

This correctly trains without error.

If I try to use model.docvecs directly or iterate over it like so:

for vector in model.docvecs:
    print(vector)

I get this error:

KeyError: "tag '9' not seen in training corpus/invalid"

What is the reason for this and how can I fix this?

Upvotes: 2

Views: 1953

Answers (1)

petezurich
petezurich

Reputation: 10214

Solved it...

I need to use

model.docvecs.doctag_syn0 (soon to be deprecated)

or

model.docvecs.vectors_docs

Upvotes: 2

Related Questions