Pelide
Pelide

Reputation: 528

How to use pre-trained word vectors in FastText?

I've just started to use FastText. I'm doing a cross validation of a small dataset by using as input the .csv file of my dataset. To process the dataset I'm using this parameters:

 model = fasttext.train_supervised(input=train_file,
                                   lr=1.0,
                                   epoch=100,
                                   wordNgrams=2,
                                   bucket=200000,
                                   dim=50,
                                   loss='hs')

However I would like to use the pre-trained embeddings from wikipedia available on the FastText website. Is it feasible? If so, I have to add a specific parameter to the parameters list?

Upvotes: 5

Views: 9365

Answers (2)

Motivation

If your training dataset is small, you can start from FastText pretrained vectors, making the classificator start with some preexisting knowledge. In order to improve the performance of the classifier, it could be beneficial or useless: you should do some tests.

Training a fastText classifier, starting from pretrained vectors

You can download pretrained vectors (.vec files) from this page.

These vectors have dimension 300. You can train your model by doing:

model = fasttext.train_supervised(input=TRAIN_FILEPATH, lr=1.0, epoch=100,
                             wordNgrams=2, bucket=200000, dim=300, loss='hs',
                             pretrainedVectors=VECTORS_FILEPATH)

Change vectors dimension

You probably don't need to change vectors dimension. But if you have to, you can think about making this change in three steps:

  • Download .bin model (from here)
  • Reduce .bin model dimension (see this)
  • Convert .bin model to .vec file (see this answer)

Upvotes: 11

gojomo
gojomo

Reputation: 54243

I've not noticed any mention in the Facebook FastText docs of preloading a model before supervised-mode training, nor have I seen any examples work that purports to do so.

Further, as the goals of word-vector training are different in unsupervised mode (predicting neighbors) and supervised mode (predicting labels), I'm not sure there'd be any benefit to such an operation.

Even if the word-vectors gave training a slight head-start, ultimately you'd want to run the training for enough epochs to 'converge' the model to as-good-as-it-can-be at its training task, predicting labels. And, by that point, any remaining influence of the original word-vectors may have diluted to nothing, as they were optimized for another task.

Why do you want to do this? In what way was typical supervised training on your data insufficient, and what benefit would you expect from starting from word-vectors from some other mode and dataset?

Upvotes: 2

Related Questions