Is it possible to fine tune FastText models

I'm working on a project for text similarity using FastText, the basic example I have found to train a model is:

from gensim.models import FastText

model = FastText(tokens, size=100, window=3, min_count=1, iter=10, sorted_vocab=1)

As I understand it, since I'm specifying the vector and ngram size, the model is been trained from scratch here and if the dataset is small I would spect great resutls.

The other option I have found is to load the original Wikipedia model which is a huge file:

from gensim.models.wrappers import FastText

model = FastText.load_fasttext_format('wiki.simple')

My question is, can I load the Wikipedia or any other model, and fine tune it with my dataset?

Upvotes: 4

Views: 7951

Answers (1)

Sam H.
Sam H.

Reputation: 4349

If you have a labelled dataset, then you should be able to fine-tune to it. This GitHub issue explains that you want to use the pretrainedVectors option. You would start with the Wikipedia pretrained vectors, then train on your dataset. It seems that gensim can do this, but according to this GH issue, there has been some bugs.

Upvotes: 4

Related Questions