Reputation: 528
I've just started to use FastText. I'm doing a cross validation of a small dataset by using as input the .csv
file of my dataset. To process the dataset I'm using this parameters:
model = fasttext.train_supervised(input=train_file,
lr=1.0,
epoch=100,
wordNgrams=2,
bucket=200000,
dim=50,
loss='hs')
However I would like to use the pre-trained embeddings from wikipedia available on the FastText website. Is it feasible? If so, I have to add a specific parameter to the parameters list?
Upvotes: 5
Views: 9365
Reputation: 3536
If your training dataset is small, you can start from FastText pretrained vectors, making the classificator start with some preexisting knowledge. In order to improve the performance of the classifier, it could be beneficial or useless: you should do some tests.
You can download pretrained vectors (.vec files) from this page.
These vectors have dimension 300. You can train your model by doing:
model = fasttext.train_supervised(input=TRAIN_FILEPATH, lr=1.0, epoch=100,
wordNgrams=2, bucket=200000, dim=300, loss='hs',
pretrainedVectors=VECTORS_FILEPATH)
You probably don't need to change vectors dimension. But if you have to, you can think about making this change in three steps:
Upvotes: 11
Reputation: 54243
I've not noticed any mention in the Facebook FastText docs of preloading a model before supervised-mode training, nor have I seen any examples work that purports to do so.
Further, as the goals of word-vector training are different in unsupervised mode (predicting neighbors) and supervised mode (predicting labels), I'm not sure there'd be any benefit to such an operation.
Even if the word-vectors gave training a slight head-start, ultimately you'd want to run the training for enough epochs to 'converge' the model to as-good-as-it-can-be at its training task, predicting labels. And, by that point, any remaining influence of the original word-vectors may have diluted to nothing, as they were optimized for another task.
Why do you want to do this? In what way was typical supervised training on your data insufficient, and what benefit would you expect from starting from word-vectors from some other mode and dataset?
Upvotes: 2