Kumal Pereira
Kumal Pereira

Reputation: 69

Building a POS tagger for a new language

I'm kind of new to NLP and I'm trying to build a POS tagger for Sinhala language. Are there any specific steps to follow to build the system?

Upvotes: 4

Views: 4291

Answers (2)

Jason Angel
Jason Angel

Reputation: 2444

The most common approach is use labeled data in order to train a supervised machine learning algorithm. If you want to follow it, check this tutorial train your own POS tagger, then, you will need a POS tagset and a corpus for create a POS tagger in supervised fashion.

In the other hand you can try some unsupervised methods. I found this semi-supervised method for Sinhala precisely HIDDEN MARKOV MODEL BASED PART OF SPEECH TAGGER FOR SINHALA LANGUAGE . Consider semi-supervised learning is a variation of unsupervised learning, hence dispite you do not need make big efforts to tag an entire corpus, some labels are needed. Finally, there are some completely unsupervised alternatives you can adapt to Sinhala.

Good luck!

Upvotes: 3

neurite
neurite

Reputation: 2824

Here is one way of doing it with a neural network. You will need a lot of samples already labeled with POS tags. Then you can use the samples to train a RNN. The x input to the RNN will be the sequence of tokens (words) and the y output will be the POS tags. The RNN, once trained, can be used as a POS tagger. Good tutorials of RNN such as the ones from WildML are worth reading.

Upvotes: 2

Related Questions