vijay athithya
vijay athithya

Reputation: 1529

Find similarity with doc2vec like word2vec

Is there a way to find similar docs like we do in word2vec

Like:

  model2.most_similar(positive=['good','nice','best'],
    negative=['bad','poor'],
    topn=10)

I know we can use infer_vector,feed them to have similar ones, but I want to feed many positive and negative examples as we do in word2vec.

is there any way we can do that! thanks !

Upvotes: 0

Views: 295

Answers (2)

gojomo
gojomo

Reputation: 54153

The doc-vectors part of a Doc2Vec model works just like word-vectors, with respect to a most_similar() call. You can supply multiple doc-tags or full vectors inside both the positive and negative parameters.

So you could call...

sims = d2v_model.docvecs.most_similar(positive=['doc001', 'doc009'], negative=['doc102'])

...and it should work. The elements of the positive or negative lists could be doc-tags that were present during training, or raw vectors (like those returned by infer_vector(), or your own averages of multiple such vectors).

Upvotes: 1

Adnan S
Adnan S

Reputation: 1882

Don't believe there is a pre-written function for this.

One approach would be to write a function that iterates through each word in the positive list to get top n words for a particular word.

So for positive words in your question example, you would end up with 3 lists of 10 words.

You could then identify words that are common across the 3 lists as the top n similar to your positive list. Since not all words will be common across the 3 lists, you probably need to get top 20 similar words when iterating so you end up with top 10 words as you want in your example.

Then do the same for negative words.

Upvotes: 0

Related Questions