Reputation: 11
I have a dataset of my native language, can I generate a Named Entity Recognition(NER) model for my language? How should I proceed with this, Is any tutorial based on this that develops a NER model of my own language which helps to learn from scratch?
Upvotes: 0
Views: 64
Reputation: 214
You have several options. If you have a corpus of your language, you can train word embedding model, word2vec
, and use your trained word embeddings to train sequential models, e.g., BiLSTM
. Also, if you have a very large corpus, you can even pretrain a transformer-based model like BERT
to fine-tune in your labeled NER dataset. If you don't have such corpus, you can use CRF
model with hand-crafted features, such as length of the words and whether words are titled or not etc.
Upvotes: 1