Reputation: 10383
I'm working on a project for text similarity using FastText, the basic example I have found to train a model is:
from gensim.models import FastText
model = FastText(tokens, size=100, window=3, min_count=1, iter=10, sorted_vocab=1)
As I understand it, since I'm specifying the vector and ngram size, the model is been trained from scratch here and if the dataset is small I would spect great resutls.
The other option I have found is to load the original Wikipedia model which is a huge file:
from gensim.models.wrappers import FastText
model = FastText.load_fasttext_format('wiki.simple')
My question is, can I load the Wikipedia or any other model, and fine tune it with my dataset?
Upvotes: 4
Views: 7951
Reputation: 4349
If you have a labelled dataset, then you should be able to fine-tune to it. This GitHub issue explains that you want to use the pretrainedVectors option. You would start with the Wikipedia pretrained vectors, then train on your dataset. It seems that gensim can do this, but according to this GH issue, there has been some bugs.
Upvotes: 4