Darshil Babel
Darshil Babel

Reputation: 145

Stanford NER: how to add our own tags in existing NER models?

I am trying to make my own NER classifer with my own tags in it. I tried training my model using instuctions in http://nlp.stanford.edu/software/crf-faq.shtml#j. But the problem is I do not have much training data. So I was thinking if there is a way we can add our own tags in existing classifiers like english.all.3class.distsim.crf.ser, english.all.7class.distsim.crf.ser etc. I can train the classifier for my own tags.

Please help me in this regard. Thank you in advance.

Upvotes: 0

Views: 1800

Answers (1)

Uday Sagar
Uday Sagar

Reputation: 480

You can have any tags(ex: PERSON) by replacing the default ones(ex: PERS) in the .tsv file. The classifier learns the tags you have supplied via the tsv file and then it tags with the ones you supplied when you supply the custom tag based model.

Taking a part of jane-austen-emma-ch1.tsv(from http://nlp.stanford.edu/software/ner-example/jane-austen-emma-ch1.tsv) file and putting our own custom tags for training as follows. I have got two tags- PERSON and ADJECTIVE

CHAPTER O
I   O
Emma    PERSON
Woodhouse   PERSON
,   O
handsome    ADJECTIVE
,   O
clever  ADJECTIVE
,   O
and O
rich    ADJECTIVE
,   O
with    O
a   O
comfortable ADJECTIVE

Now you can feed this tsv file to the classifier(put this tsv file name in .prop file) and generate the model as shown below-

java -cp "stanford-ner.jar:slf4j-api.jar" edu.stanford.nlp.ie.crf.CRFClassifier -prop ner.prop

Now, let's test the model for any text file and see how it annotates. Let's take the following text file(toBeAnnotated.txt)

CHAPTER O
I Emma Woodhouse, handsome, clever and rich, with a comfortable home and happy disposition, seemed to unite some of the best blessings

Running the following command annotates the above text file-

java -mx600m -cp "stanford-ner.jar:slf4j-api.jar" edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ner-model.ser.gz -textFile toBeAnnotated.txt -outputFormat inlineXML 2> /dev/null

The output I have got is(I have added newlines for clarity)-

    I <PERSON>Emma Woodhouse</PERSON>, 
<ADJECTIVE>handsome</ADJECTIVE>, <ADJECTIVE>clever</ADJECTIVE>
     and <ADJECTIVE>rich</ADJECTIVE>, with a <ADJECTIVE>comfortable</ADJECTIVE>
 home and happy <ADJECTIVE>disposition</ADJECTIVE>, 
seemed to unite some of the best blessings

Upvotes: 1

Related Questions