Reputation: 31
I am trying to find words that are similar to two different words. I know that I can find the most similar word with FastText but I was wondering if there is a way to find a keyword that is similar to two keywords. For example, "apple" is similar to "orange" and also similar to "kiwi". So, what I want to do is if I have two words, "organ" and "kiwi", then I would like to get a suggestion of the keyword "apple" or any other fruits. Is there a way to do this?
Upvotes: 0
Views: 723
Reputation: 359
I have used the Gensim W2V implementation for such computations for years now, but Gensim has also FastText implementation: https://radimrehurek.com/gensim/models/fasttext.html
Upvotes: 0
Reputation: 3536
I think that there isn't out of the box function for this feature.
In any case, you can think about this simple approach:
A small note: this is a crude approach. If necessary, even more sophisticated operations can be performed using the similarity cosine.
Code example:
import fasttext
# load the pretrained model
# (in the example I use the Italian model)
model=fasttext.load_model('./ml_models/cc.it.300.bin')
# get nearest neighbors for the interested words (100 neighbors)
arancia_nn=model.get_nearest_neighbors('arancia', k=100)
kiwi_nn=model.get_nearest_neighbors('kiwi', k=100)
# get only words sets (discard the similarity cosine)
arancia_nn_words=set([el[1] for el in arancia_nn])
kiwi_nn_words=set([el[1] for el in kiwi_nn])
# compute the intersection
common_similar_words=arancia_nn_words.intersection(kiwi_nn_words)
Example output (in Italian):
{'agrume',
'agrumi',
'ananas',
'arance',
'arancie',
'arancio',
'avocado',
'banana',
'ciliegia',
'fragola',
'frutta',
'lime',
'limone',
'limoni',
'mandarino',
'mela',
'mele',
'melograno',
'melone',
'papaia',
'papaya',
'pera',
'pompelmi',
'pompelmo',
'renetta',
'succo'}
Upvotes: 0