Leo Lee
Leo Lee

Reputation: 127

How to train my own NER model with stanford libraries?

I have been suffering from hundreds of emails about travel information. One of my job is saving some of the information in the emails into out system db.
My plan is to make this happen automatically, and this is why I started to study StanfordNER and IE stuffs.

Here we go,
This is my email example. It is not a sentence, and contains even some code.

sample email

NO. PETER 17 HIGHSCHOOL/2TH/OPEN
LONDON,ENGLAND STY 12-13TH JUNE

NO. JAKE 12 HIGHSCHOOL/OPEN
LIVERPOOL,ENGLAND 12,13 JUNE

I need only name, location and dates from these So I made my tsv

dummy-vess-corpus.tsv

NO  O
.   O
PETER   PERSON
JAKE    PERSON
17  O
12  O
HIGHSCHOOL  O
2TH O
OPEN    O
LONDON  CITY
LIVERPOOL   CITY
ENGLAND COUNTRY
12-13TH DATE
12  DATE
13  DATE
JUNE    MONTH

prop.txt

trainFile = train/dummy-vess-corpus.tsv
serializeTo = dummy-ner-model-vess.ser.gz
map = word=0,answer=1

useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true

build model cmd

java -cp "stanford-ner.jar:lib/*" -mx4g edu.stanford.nlp.ie.crf.CRFClassifier -prop train/prop.txt

output

[('NO', 'O'), ('.', 'O'), ('PETER', 'O'), ('17', 'O'), 
('HIGHSCHOOL2THOPEN', 'O'), ('LONDON', 'CITY'), (',', 'CITY'), 
('ENGLAND','COUNTRY'), ('STY', 'DATE'), ('12-13TH', 'DATE'), ('JUNE', 'MONTH'), 
('NO', 'O'), ('.', 'O'), ('JAKE', 'O'), ('12', 'O'), ('HIGHSCHOOLOPEN', 'O'), 
('LIVERPOOL', 'O'), (',', 'O'), ('ENGLAND', 'COUNTRY'), ('12,13', 'DATE'), ('JUNE', 'MONTH')]

It does not work at all. I have been looking for the goole to find out the way of traing, but I can find only simple examples...

Upvotes: 0

Views: 183

Answers (1)

Emre
Emre

Reputation: 119

For each line in the dummy-vess-corpus.tsv file, you must choose one of the following annotators.

location
time
organization
percent
money
person
date

For example dummy-vess-corpus.tsv file should be like this;

NO  O
.   O
PETER   person
JAKE    person
LONDON  location

If you want to add new annotator, you can look at this link

Upvotes: 1

Related Questions