Reputation: 11933
How can I find the N-nearest words given a word using gensim's word2vec implementation. What is the API for that? I am referring to skip grams here. Maybe I missed something, I read all about finding similar words, finding the odd one out and so on...
In DL4j I have this method called wordsNearest(String A, int n) which gives me the n-nearest words to A
. What is the equivalent of this in Gensim?
Upvotes: 2
Views: 2363
Reputation: 22822
If I understand your question correctly, you can use most_similar:
model.most_similar(positive=['woman'])
Upvotes: 0
Reputation:
This question is really old, but anyway: I'm still not entirely sure how hierarchical softmax and negative sampling work, but in principle you should be able to take a vector from the input matrix, multiply it by vectors in the output matrix, and choose the highest value.
w1_vec = model[word] #Get the vector for the word you're interested in.
# Loop over the words in the output matrix and take dot products.
for idx, w2_vec in enumerate(model.syn1neg):
print(idx, model.index2word[idx], np.exp(np.dot(w1_vec, w2_vec)))
Then choose the highest values from the output. Use syn1neg / syn1 depending on negative sampling / hierarchical softmax. I've used this technique on a few sample texts and the results are reasonable.
Upvotes: 1