Reputation: 21
I am doing sentiment analysis on twitter dataset in Arabic , and finished the phase of preprocessing on data .I want to use fasttext tool to build a classifier but I do not know how , I need some clear steps to upload my data and build the classifier , any help ?
Upvotes: 1
Views: 895
Reputation: 3536
I think the official tutorial can be useful for you: https://fasttext.cc/docs/en/supervised-tutorial.html. It explains the steps to follow.
I give you some details about data preparation (the tutorial explains it superficially)
__label__firstlabel __label__secondlabel example text line
__label__thirdlabel other example text line
__label__firstlabel __label__fourthlabel another example text line
Each line of your dataset must start with one or more labels (to feed the classifier), then the text line.
The example in the tutorial is the following:
head -n 12404 cooking.stackexchange.txt > cooking.train
tail -n 3000 cooking.stackexchange.txt > cooking.valid
An idea: if your corpus is not very big, you can improve the performance of your model using pretrained vectors, provided by Fasttext (option pretrainedVectors: https://fasttext.cc/docs/en/options.html)
For more informations on Fasttext, i suggest the book fastText Quick Start Guide by Joydeep Bhattacharjee (https://www.oreilly.com/library/view/fasttext-quick-start/9781789130997/)
Upvotes: 2