Find the closest word to set of words

Question

I would need to find something like the opposite of model.most_similar()
While most_similar() returns an array of words most similar to the one given as input, I need to find a sort of "center" of a list of words.

Is there a function in gensim or any other tool that could help me?

Example:
Given {'chimichanga', 'taco', 'burrito'} the center would be maybe mexico or food, depending on the corpus that the model was trained on

gojomo · Accepted Answer

If you supply a list of words as the positive argument to most_similar(), it will report words closest to their mean (which would seem to be one reasonable interpretation of the words' 'center').

For example:

sims = model.most_similar(positive=['chimichanga', 'taco', 'burrito'])

(I somewhat doubt the top result sims[0] here will be either 'mexico' or 'food'; it's most likely to be another mexican-food word. There isn't necessarily a "more generic"/hypernym relation to be found either between word2vec words, or in certain directions... but some other embedding techniques, such as hyperbolic embeddings, might provide that.)

Find the closest word to set of words

Answers (1)

Related Questions