Reputation: 814
I know that CoreNLP's RegexNER allows me to overwrite a tag using the mapping file. For example; I have the word EGFR which CoreNLP recognizes as an ORGANIZATION. If I have the following line in my mapping file, it still tags it as an ORGANIZATION.
EGFR GENE
If I change that line to look like the following:
EGFR GENE ORGANIZATION
Then CoreNLP tags it as a GENE.
To be able to do this though, I have to know that CoreNLP tags EGFR as an ORGANIZATION and I can't always know that for every word in my mapping file. Now my question is, is there a way to tell the RegexNER to overwrite the tag for EGFR no matter what the original tag is? Something like
EGFR GENE .*
Upvotes: 1
Views: 305
Reputation: 1
Great answer by @StanfordNLPHelp
However, if you are using ner.fine for mappings, use properties below to get the overriding -
Properties props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma, ner,regexner");
props.setProperty("ner.fine.regexner.mapping", rulesFiles);
// props.put("regexner.backgroundSymbol", "ORGANIZATION,PERSON,LOCATION,MISC,O");
props.put("ner.fine.regexner.backgroundSymbol", "ORGANIZATION,PERSON,LOCATION,MISC,O");
Upvotes: 0
Reputation: 8739
You can provide a comma separated list of tags that can be overwritten.
For instance:
ORGANIZATION,PERSON,LOCATION,MISC
will allow it to overwrite all of those tags.
I don't think there is an overwrite all option at the moment, so you do have to list each type you want overwritten.
If you always want to overwrite everything with what is in your rules you can supply that with this option to the TokensRegexNERAnnotator
regexner.backgroundSymbol ORGANIZATION,PERSON,LOCATION,MISC,O
And then each rule doesn't have to have a list.
Upvotes: 2