How to search Word2Vec or GloVe Embedding to find words by semantic relationship

Question

Common examples of showing Word Embedding's strength is to show semantic relationship between some words such king:queen = male:female. How can this type of relationship be discovered? Is that through some kind of visualization based on geometric clustering? Any pointer will be appreciated.

Maxim · Accepted Answer

If by "discovered" you mean supervised learning, there are datasets that contain lots of already extracted relationships, such as "city-in-state", "capital-world", "superlative", etc.

This dataset is a popular choice for intrinsic evaluation of word vectors in completing word vector analogies. See also this question.

Efficient unsupervised extraction of these relationships can be tricky. A naive algorithm requires O(n²) time and memory, where n is the number of words in a vocabulary, which is huge. In general, this problem boils down to efficient index construction.

But if you want just to train it yourself and play around with word embeddings, you can simply use gensim:

model = gensim.models.word2vec.Word2Vec(sentences=sentences, size=100, window=4,
                                        workers=5, sg=1, min_count=20, iter=10)
word_vectors = model.wv
similar = word_vectors.most_similar(positive=['woman', 'king'], negative=['man'])
# [(u'queen', 0.7188869714736938), (u'empress', 0.6739267110824585), ...

Note that you'll need a big corpus for that, such as text8.

How to search Word2Vec or GloVe Embedding to find words by semantic relationship

Answers (1)

Related Questions