Reputation: 8025

python glove similarity measure calculation

i am trying to understand how python-glove computes most-similar terms.

Is it using cosine similarity?

Example from python-glove github https://github.com/maciejkula/glove-python/tree/master/glove :

I know that from gensim's word2vec, the most_similar method computes similarity using cosine distance.

Upvotes: 4

Answers (3)

Memo

Reputation: 1

yes it uses the cosine similarity.

the paper mentioning that in text : ... A similarity score is obtained from the word vectors by first normalizing each feature across the vocabulary and then calculating the cosine similarity....

Upvotes: -1

droid

Reputation: 291

The project website is a bit unclear on this point:

The Euclidean distance (or cosine similarity) between two word vectors provides an effective method for measuring the linguistic or semantic similarity of the corresponding words.

Euclidean distance is not the same as cosine similarity. It sounds like either works well enough, but it does not specify which is used.

However, we can observe the source of the repo you are looking at to see:

dst = (np.dot(self.word_vectors, word_vec)
       / np.linalg.norm(self.word_vectors, axis=1)
       / np.linalg.norm(word_vec))

It uses cosine similarity.

Upvotes: 2

AbdealiLoKo

Reputation: 3337

On the glove project website, this is explained with a fair amount of clarity. http://www-nlp.stanford.edu/projects/glove/

In order to capture in a quantitative way the nuance necessary to distinguish man from woman, it is necessary for a model to associate more than a single number to the word pair. A natural and simple candidate for an enlarged set of discriminative numbers is the vector difference between the two word vectors. GloVe is designed in order that such vector differences capture as much as possible the meaning specified by the juxtaposition of two words.

To read more about the math behind this, check the "Model overview" section in the website

Upvotes: 1

python glove similarity measure calculation

Answers (3)

Related Questions