Isbister
Isbister

Reputation: 946

How to find similar words with FastText?

I am playing around with FastText, https://pypi.python.org/pypi/fasttext,which is quite similar to Word2Vec. Since it seems to be a pretty new library with not to many built in functions yet, I was wondering how to extract morphological similar words.

For eg: model.similar_word("dog") -> dogs. But there is no function built-in.

If I type model["dog"]

I only get the vector, that might be used to compare cosine similarity. model.cosine_similarity(model["dog"], model["dogs"]]).

Do I have to make some sort of loop and do cosine_similarity on all possible pairs in a text? That would take time ...!!!

Upvotes: 15

Views: 27382

Answers (6)

mejobhoot
mejobhoot

Reputation: 91

Fasttext has a method called get_nearest_neighbors. nearest neighbor queries. One needs the model's .bin file to use this.

enter image description here

Upvotes: 2

ChiaChong Lau
ChiaChong Lau

Reputation: 31

Use gensim,

from gensim.models import FastText

model = FastText.load(PATH_TO_MODEL)
model.wv.most_similar(positive=['dog'])

More info here

Upvotes: 3

Kalana Geesara
Kalana Geesara

Reputation: 151

You can install pyfasttext library to extract the most similar or nearest words to a particualr word.

from pyfasttext import FastText
model = FastText('model.bin')
model.nearest_neighbors('dog', k=2000)

Or you can get the latest development version of fasttext, you can install from the github repository :

import fasttext
model = fasttext.load_model('model.bin')
model.get_nearest_neighbors('dog', k=100)

Upvotes: 11

You can install and import gensim library and then use gensim library to extract most similar words from the model that you downloaded from FastText.

Use this:

import gensim
model = gensim.models.KeyedVectors.load_word2vec_format('model.vec')
similar = model.most_similar(positive=['man'],topn=10)

And by topn parameter you get the top 10 most similar words.

Upvotes: 7

far-zadeh
far-zadeh

Reputation: 145

You should use gensim to load the model.vec and then get similar words:

m = gensim.models.Word2Vec.load_word2vec_format('model.vec')
m.most_similar(...)

Upvotes: 6

Snehal
Snehal

Reputation: 757

Use Gensim, load fastText trained .vec file with load.word2vec models and use most_similiar() method to find similar words!

Upvotes: 16

Related Questions