How to get most similar words to a document in gensim doc2vec?

Question

I have built a gensim Doc2vec model. Let's call it doc2vec. Now I want to find the most relevant words to a given document according to my doc2vec model.

For example, I have a document about "java" with the tag "doc_about_java". When I ask for similar documents, I get documents about other programming languages and topics related to java. So my document model works well.

Now I want to find the most relevant words to "doc_about_java".

I follow the solution from the closed question How to find most similar terms/words of a document in doc2vec? and it gives me seemingly random words, the word "java" is not even among the first 100 similar words:

docvec = doc2vec.docvecs['doc_about_java']
print doc2vec.most_similar(positive=[docvec], topn=100)

I also tried like this:

print doc2vec.wv.similar_by_vector(doc2vec["doc_about_java"])

but it didn't change anything. How can I find the most similar words to a given document?

How to get most similar words to a document in gensim doc2vec?

Answers (1)

Related Questions