Eman Ahmed
Eman Ahmed

Reputation: 21

how to deal with fasttext library to build a text classifier?

I am doing sentiment analysis on twitter dataset in Arabic , and finished the phase of preprocessing on data .I want to use fasttext tool to build a classifier but I do not know how , I need some clear steps to upload my data and build the classifier , any help ?

Upvotes: 1

Views: 895

Answers (1)

I think the official tutorial can be useful for you: https://fasttext.cc/docs/en/supervised-tutorial.html. It explains the steps to follow.

I give you some details about data preparation (the tutorial explains it superficially)

  • First of all, you have to prepare your dataset in this way:

__label__firstlabel __label__secondlabel example text line
__label__thirdlabel other example text line
__label__firstlabel __label__fourthlabel another example text line

Each line of your dataset must start with one or more labels (to feed the classifier), then the text line.

  • Then, you have to split the dataset in train set and validation set

The example in the tutorial is the following:

head -n 12404 cooking.stackexchange.txt > cooking.train
tail -n 3000 cooking.stackexchange.txt > cooking.valid

  • Then you can train your classifier, test it and make it better...

An idea: if your corpus is not very big, you can improve the performance of your model using pretrained vectors, provided by Fasttext (option pretrainedVectors: https://fasttext.cc/docs/en/options.html)

For more informations on Fasttext, i suggest the book fastText Quick Start Guide by Joydeep Bhattacharjee (https://www.oreilly.com/library/view/fasttext-quick-start/9781789130997/)

Upvotes: 2

Related Questions