user8871463
user8871463

Reputation:

Get a similarity matrix from word2vec in python (Gensim)

I am using the following python code to generate similarity matrix of word vectors (My vocabulary size is 77).

similarity_matrix = []
index = gensim.similarities.MatrixSimilarity(gensim.matutils.Dense2Corpus(model.wv.syn0))

for sims in index:
    similarity_matrix.append(sims)
similarity_array = np.array(similarity_matrix)

The dimensionality of the similarity_array is 300 X 300. However as I understand the dimensionality should be 77 x 77 (as my vocabulary size is 77).

i.e.,
      word1, word2, ......, word77
word1 0.2,     0.8,    ...,  0.9
word2 0.1,     0.2,   ....,  1.0
...  ....,    ....., .....,   ....
word77 0.9,  0.8,    ...,    0.1

Please let me know what is wrong in my code.

Moreover, I want to know what is the order of the vocabulary (word1, word2, ..., word77) used to calculate this similarity matrix? Can I obtain this order from model.wv.index2word?

Please help me!

Upvotes: 3

Views: 4509

Answers (2)

AReus
AReus

Reputation: 41

it's been long since this question has been posted, but maybe my answer will be of help. The code below gives the same results as index = gensim.similarities.MatrixSimilarity(gensim.matutils.Dense2Corpus(model.wv.syn0.T)), with the for loop, but is more concise.

import numpy as np    
similarity_matrix = np.dot(model.wv.syn0norm, model.wv.syn0norm.T)

It calculates the dot product between normalized word-vectors, i.e. distances between the pairs.

Upvotes: 3

Nicomedes E.
Nicomedes E.

Reputation: 1334

Try to replace

index = gensim.similarities.MatrixSimilarity(gensim.matutils.Dense2Corpus(model.wv.syn0))  

to

index = gensim.similarities.MatrixSimilarity(gensim.matutils.Dense2Corpus(model.wv.syn0.T))

Upvotes: 4

Related Questions