Reputation: 2396
I would need to find something like the opposite of model.most_similar()
While most_similar()
returns an array of words most similar to the one given as input, I need to find a sort of "center" of a list of words.
Is there a function in gensim or any other tool that could help me?
Example:
Given {'chimichanga', 'taco', 'burrito'}
the center would be maybe mexico
or food
, depending on the corpus that the model was trained on
Upvotes: 2
Views: 1592
Reputation: 54173
If you supply a list of words as the positive
argument to most_similar()
, it will report words closest to their mean (which would seem to be one reasonable interpretation of the words' 'center').
For example:
sims = model.most_similar(positive=['chimichanga', 'taco', 'burrito'])
(I somewhat doubt the top result sims[0]
here will be either 'mexico' or 'food'; it's most likely to be another mexican-food word. There isn't necessarily a "more generic"/hypernym relation to be found either between word2vec words, or in certain directions... but some other embedding techniques, such as hyperbolic embeddings, might provide that.)
Upvotes: 3