Yavar
Yavar

Reputation: 11933

Gensim word2vec finding nearest words given a word

How can I find the N-nearest words given a word using gensim's word2vec implementation. What is the API for that? I am referring to skip grams here. Maybe I missed something, I read all about finding similar words, finding the odd one out and so on...

In DL4j I have this method called wordsNearest(String A, int n) which gives me the n-nearest words to A. What is the equivalent of this in Gensim?

Upvotes: 2

Views: 2363

Answers (2)

Kamil Sindi
Kamil Sindi

Reputation: 22822

If I understand your question correctly, you can use most_similar:

model.most_similar(positive=['woman'])

Upvotes: 0

user2918461
user2918461

Reputation:

This question is really old, but anyway: I'm still not entirely sure how hierarchical softmax and negative sampling work, but in principle you should be able to take a vector from the input matrix, multiply it by vectors in the output matrix, and choose the highest value.

w1_vec = model[word] #Get the vector for the word you're interested in.
# Loop over the words in the output matrix and take dot products.
for idx, w2_vec in enumerate(model.syn1neg):
    print(idx, model.index2word[idx], np.exp(np.dot(w1_vec, w2_vec)))

Then choose the highest values from the output. Use syn1neg / syn1 depending on negative sampling / hierarchical softmax. I've used this technique on a few sample texts and the results are reasonable.

Upvotes: 1

Related Questions