Reputation: 69
I'm kind of new to NLP and I'm trying to build a POS tagger for Sinhala language. Are there any specific steps to follow to build the system?
Upvotes: 4
Views: 4291
Reputation: 2444
The most common approach is use labeled data in order to train a supervised machine learning algorithm. If you want to follow it, check this tutorial train your own POS tagger, then, you will need a POS tagset and a corpus for create a POS tagger in supervised fashion.
In the other hand you can try some unsupervised methods. I found this semi-supervised method for Sinhala precisely HIDDEN MARKOV MODEL BASED PART OF SPEECH TAGGER FOR SINHALA LANGUAGE . Consider semi-supervised learning is a variation of unsupervised learning, hence dispite you do not need make big efforts to tag an entire corpus, some labels are needed. Finally, there are some completely unsupervised alternatives you can adapt to Sinhala.
Good luck!
Upvotes: 3
Reputation: 2824
Here is one way of doing it with a neural network. You will need a lot of samples already labeled with POS tags. Then you can use the samples to train a RNN. The x input to the RNN will be the sequence of tokens (words) and the y output will be the POS tags. The RNN, once trained, can be used as a POS tagger. Good tutorials of RNN such as the ones from WildML are worth reading.
Upvotes: 2