jxn
jxn

Reputation: 8025

python glove similarity measure calculation

i am trying to understand how python-glove computes most-similar terms.

Is it using cosine similarity?

Example from python-glove github https://github.com/maciejkula/glove-python/tree/master/glove :enter image description here

I know that from gensim's word2vec, the most_similar method computes similarity using cosine distance.
enter image description here

Upvotes: 4

Views: 7451

Answers (3)

Memo
Memo

Reputation: 1

yes it uses the cosine similarity.

the paper mentioning that in text : ... A similarity score is obtained from the word vectors by first normalizing each feature across the vocabulary and then calculating the cosine similarity....

Upvotes: -1

droid
droid

Reputation: 291

The project website is a bit unclear on this point:

The Euclidean distance (or cosine similarity) between two word vectors provides an effective method for measuring the linguistic or semantic similarity of the corresponding words.

Euclidean distance is not the same as cosine similarity. It sounds like either works well enough, but it does not specify which is used.

However, we can observe the source of the repo you are looking at to see:

dst = (np.dot(self.word_vectors, word_vec)
       / np.linalg.norm(self.word_vectors, axis=1)
       / np.linalg.norm(word_vec))

It uses cosine similarity.

Upvotes: 2

AbdealiLoKo
AbdealiLoKo

Reputation: 3337

On the glove project website, this is explained with a fair amount of clarity. http://www-nlp.stanford.edu/projects/glove/

In order to capture in a quantitative way the nuance necessary to distinguish man from woman, it is necessary for a model to associate more than a single number to the word pair. A natural and simple candidate for an enlarged set of discriminative numbers is the vector difference between the two word vectors. GloVe is designed in order that such vector differences capture as much as possible the meaning specified by the juxtaposition of two words.

To read more about the math behind this, check the "Model overview" section in the website

Upvotes: 1

Related Questions