Nicolò Gasparini
Nicolò Gasparini

Reputation: 2396

Find the closest word to set of words

I would need to find something like the opposite of model.most_similar()
While most_similar() returns an array of words most similar to the one given as input, I need to find a sort of "center" of a list of words.

Is there a function in gensim or any other tool that could help me?

Example:
Given {'chimichanga', 'taco', 'burrito'} the center would be maybe mexico or food, depending on the corpus that the model was trained on

Upvotes: 2

Views: 1592

Answers (1)

gojomo
gojomo

Reputation: 54173

If you supply a list of words as the positive argument to most_similar(), it will report words closest to their mean (which would seem to be one reasonable interpretation of the words' 'center').

For example:

sims = model.most_similar(positive=['chimichanga', 'taco', 'burrito'])

(I somewhat doubt the top result sims[0] here will be either 'mexico' or 'food'; it's most likely to be another mexican-food word. There isn't necessarily a "more generic"/hypernym relation to be found either between word2vec words, or in certain directions... but some other embedding techniques, such as hyperbolic embeddings, might provide that.)

Upvotes: 3

Related Questions