Reputation: 1262
I am trying to train my NER-model with a training set as shown below.
British B-company
Broadcasting I-company
Corporation I-company
British nationality
public B-orgTpye
service I-orgType
broadcaster I-orgTpye
headquartered HQ
London city
Newyork city
American B-company
Airlines I-company
Jaguar auto
Mercedes auto
McLaren auto
When I run my CRF classifier. It does not recognize B and I's. It considers them as seperate token labels.
Below is my code for the classifier.
String[] String2StringArray = "The British Broadcasting Corporation is a British public service broadcaster headquartered at Broadcasting House in London";
Properties props = new Properties();
String basedir = ModelLocation");
props.setProperty("ner.model", customModelFile"));
props.setProperty("ner.model", basedir);
props.setProperty("ner.combinationMode", "HIGH_RECALL");
props.setProperty("ner.useSUTime", "true");
Property("sutime.includeRange", "true");
props.setProperty("ner.applyNumericClassifiers", "true");
StringBuilder classifierOutputAsString = new StringBuilder();
/*Combining different classifier models*/
//NERClassifierCombiner classifierCombiner = new NERClassifierCombiner(props);
NERClassifierCombiner classifierCombiner = new NERClassifierCombiner(true,true,GenericNERModel_A,customModelFile));
for (String str : String2StringArray) {
String classifiedToken = classifierCombiner.classifyWithInlineXML(str);
classifierOutputAsString.append(classifiedToken);
}
System.out.println(classifierOutputAsString.toString());
The output is as shown below :
The <ORGANIZATION>British Broadcasting Corporation</ORGANIZATION> is a <nationality>British</nationality> <B-orgTpye>public</B-orgTpye> <I-orgType>service</I-orgType> <I-orgTpye>broadcaster</I-orgTpye> <HQ>headquartered</HQ> <city>at</city> <ORGANIZATION>Broadcasting House</ORGANIZATION> in <LOCATION>London</LOCATION>
Upvotes: 2
Views: 616
Reputation: 1262
Based on a previous answer on SO by Christopher Manning. I added these lines to the prop file
props.setProperty("entitySubclassification", "IOB1");
props.setProperty("retainEntitySubclassification", "true");
props.setProperty("mergeTags", "true");
Now, its uses the IOB type of encoding.
Upvotes: 2