Reputation: 526
The most_similar
method finds the top-N most similar words.
Is there a method or a way to find the N least similar words?
Upvotes: 1
Views: 1272
Reputation: 54183
You could get the full ranked list of all vectors by similarity, using a topn
parameter as large as the full set of vectors. Then look at just the last N. For example:
import sys
all_sims = vec_model.most_similar(target_value, topn=sys.maxsize)
last_10 = list(reversed(all_sims[-10:]))
However, note:
This will require a bit more sorting, & momentarily need a lot more memory, to return the full list before trimming it to the last few
These are unlikely to be especially meaningful, as either words or documents, to human perception. That is, it's unlikely to be a word's or document's 'opposite' in the senses we perceive. Such opposites, or indeed any words/docs that are interestingly contrasted with an origin point, are usually going to be quite close to the origin in the high-dimensional space, just shifted in some meaningful way. (For example, a word's antonyms are far closer to the word than the most-dissimilar words this will find.)
Upvotes: 1