Reputation: 127
I have been suffering from hundreds of emails about travel information. One of my job is saving some of the information in the emails into out system db.
My plan is to make this happen automatically, and this is why I started to study StanfordNER and IE stuffs.
Here we go,
This is my email example. It is not a sentence, and contains even some code.
NO. PETER 17 HIGHSCHOOL/2TH/OPEN
LONDON,ENGLAND STY 12-13TH JUNE
NO. JAKE 12 HIGHSCHOOL/OPEN
LIVERPOOL,ENGLAND 12,13 JUNE
I need only name, location and dates from these So I made my tsv
NO O
. O
PETER PERSON
JAKE PERSON
17 O
12 O
HIGHSCHOOL O
2TH O
OPEN O
LONDON CITY
LIVERPOOL CITY
ENGLAND COUNTRY
12-13TH DATE
12 DATE
13 DATE
JUNE MONTH
trainFile = train/dummy-vess-corpus.tsv
serializeTo = dummy-ner-model-vess.ser.gz
map = word=0,answer=1
useClassFeature=true
useWord=true
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useSequences=true
usePrevSequences=true
maxLeft=1
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
useDisjunctive=true
java -cp "stanford-ner.jar:lib/*" -mx4g edu.stanford.nlp.ie.crf.CRFClassifier -prop train/prop.txt
[('NO', 'O'), ('.', 'O'), ('PETER', 'O'), ('17', 'O'),
('HIGHSCHOOL2THOPEN', 'O'), ('LONDON', 'CITY'), (',', 'CITY'),
('ENGLAND','COUNTRY'), ('STY', 'DATE'), ('12-13TH', 'DATE'), ('JUNE', 'MONTH'),
('NO', 'O'), ('.', 'O'), ('JAKE', 'O'), ('12', 'O'), ('HIGHSCHOOLOPEN', 'O'),
('LIVERPOOL', 'O'), (',', 'O'), ('ENGLAND', 'COUNTRY'), ('12,13', 'DATE'), ('JUNE', 'MONTH')]
It does not work at all. I have been looking for the goole to find out the way of traing, but I can find only simple examples...
Upvotes: 0
Views: 183
Reputation: 119
For each line in the dummy-vess-corpus.tsv file, you must choose one of the following annotators.
location
time
organization
percent
money
person
date
For example dummy-vess-corpus.tsv file should be like this;
NO O
. O
PETER person
JAKE person
LONDON location
If you want to add new annotator, you can look at this link
Upvotes: 1