Brian Lee
Brian Lee

Reputation: 31

How to find word that are similar to two keywords using FastText?

I am trying to find words that are similar to two different words. I know that I can find the most similar word with FastText but I was wondering if there is a way to find a keyword that is similar to two keywords. For example, "apple" is similar to "orange" and also similar to "kiwi". So, what I want to do is if I have two words, "organ" and "kiwi", then I would like to get a suggestion of the keyword "apple" or any other fruits. Is there a way to do this?

Upvotes: 0

Views: 723

Answers (2)

wishiknew
wishiknew

Reputation: 359

I have used the Gensim W2V implementation for such computations for years now, but Gensim has also FastText implementation: https://radimrehurek.com/gensim/models/fasttext.html

Upvotes: 0

I think that there isn't out of the box function for this feature.

In any case, you can think about this simple approach:

  1. Load a pretrained embedding (availaible here)
  2. Get a decent amount of nearest neighbors for every interested word
  3. Search for intersections in the nearest neighbors of the two words

A small note: this is a crude approach. If necessary, even more sophisticated operations can be performed using the similarity cosine.

Code example:

import fasttext

# load the pretrained model
# (in the example I use the Italian model)
model=fasttext.load_model('./ml_models/cc.it.300.bin')

# get nearest neighbors for the interested words (100 neighbors)
arancia_nn=model.get_nearest_neighbors('arancia', k=100)
kiwi_nn=model.get_nearest_neighbors('kiwi', k=100)

# get only words sets (discard the similarity cosine)
arancia_nn_words=set([el[1] for el in arancia_nn])
kiwi_nn_words=set([el[1] for el in kiwi_nn])

# compute the intersection
common_similar_words=arancia_nn_words.intersection(kiwi_nn_words)

Example output (in Italian):

{'agrume',
 'agrumi',
 'ananas',
 'arance',
 'arancie',
 'arancio',
 'avocado',
 'banana',
 'ciliegia',
 'fragola',
 'frutta',
 'lime',
 'limone',
 'limoni',
 'mandarino',
 'mela',
 'mele',
 'melograno',
 'melone',
 'papaia',
 'papaya',
 'pera',
 'pompelmi',
 'pompelmo',
 'renetta',
 'succo'}

Upvotes: 0

Related Questions