How to use pre-trained word vectors in FastText?

Question

I've just started to use FastText. I'm doing a cross validation of a small dataset by using as input the .csv file of my dataset. To process the dataset I'm using this parameters:

 model = fasttext.train_supervised(input=train_file,
                                   lr=1.0,
                                   epoch=100,
                                   wordNgrams=2,
                                   bucket=200000,
                                   dim=50,
                                   loss='hs')

However I would like to use the pre-trained embeddings from wikipedia available on the FastText website. Is it feasible? If so, I have to add a specific parameter to the parameters list?

Stefano Fiorucci - anakin87 · Accepted Answer

Motivation

If your training dataset is small, you can start from FastText pretrained vectors, making the classificator start with some preexisting knowledge. In order to improve the performance of the classifier, it could be beneficial or useless: you should do some tests.

Training a fastText classifier, starting from pretrained vectors

You can download pretrained vectors (.vec files) from this page.

These vectors have dimension 300. You can train your model by doing:

model = fasttext.train_supervised(input=TRAIN_FILEPATH, lr=1.0, epoch=100,
                             wordNgrams=2, bucket=200000, dim=300, loss='hs',
                             pretrainedVectors=VECTORS_FILEPATH)

Change vectors dimension

You probably don't need to change vectors dimension. But if you have to, you can think about making this change in three steps:

Download .bin model (from here)
Reduce .bin model dimension (see this)
Convert .bin model to .vec file (see this answer)

How to use pre-trained word vectors in FastText?

Answers (2)

Motivation

Training a fastText classifier, starting from pretrained vectors

Change vectors dimension

Related Questions