What is the semantic relationship expected between word vectors which are scalar multiples of each other in word2vec?

Question

Let's say you have a word vector for the word queen. Some of its scalar multiples would be x = queen + queen , y = queen + queen + queen and n * queen for any real value of n ( so we're also considering non-integer values of n such as in 0.83 * queen ).

Consider x to be the word most similar to the vector queen + queen according to the cosine similarity between a simple mean of the projection weight vectors of the most similar word and the vector queen + queen.

Consider y to be the word most similar to the vector queen + queen + queen by the same method.

Then what is the semantic relationship expected between the words x, y and queen? I know these vectors will all have the same ratio between the dimensional values within the vector, but I'm having a hard time figuring out how to read this in terms of word meaning.

My intution says that I'll get something in another context that has a position on that context similar to queen. For instance, a queen's "wealth" may be significantly larger than a queen's "beauty". So I'll get another word in another context that has the same wealth/beauty balance as "queen".

So let's say I'm moving from Royal titles ( queen, king, princess... ) to the Forbes list ( Jeff Bezos, Bill Gates, Warren Buffet... ) when I multiply queen by n.

queen * n = someone on the Forbes list that has the same wealth/beauty balance as a queen ( very wealthy, but not very pretty )

princess * n = someone on the Forbes list that has the same wealth/beauty balance as a princess ( moderately wealthy, but very pretty )

However this is just a wild theory, I have no clue about how to systematically prove this is real.

gojomo · Accepted Answer

The words that most cosine-similar to to wv['queen'] will be the exact same that are most cosine-similar to n * wv['queen'], for any n, because cosine-similarity is unaffected by vector magnitude. So, your assumption is wrong.

If you were to use euclidean-distance instead of cosine-similarity, on the raw (not unit-normalized) word vectors, you might find some other interesting relationships... but that's not a typical way to use/compare word-vectors, so you'd have to experiment & I have no expectations of what you might find or whether it would be useful.

In general, the raw non-unit-normalized word-vectors tend to have a higher-magnitude for words that have a single narrow sense (all contexts they appear in are very similar), while words with many senses and varied contexts tend to have smaller magnitudes. But I'm not sure you can count on this from much. Once word-vectors are normalized to unit-length – and thus all words are on the same 'unit sphere' – then the rank order of nearest-neighbors will be same by either cosine-distance or euclidean-distance (even though the magnitudes of the distance/similarity numbers won't be identical or proportionate at each rank).

What is the semantic relationship expected between word vectors which are scalar multiples of each other in word2vec?

Answers (1)

Related Questions