Reputation: 11
We need to add terms to named entity extraction tables/model in Stanford and can't figure out how. Use case - we need to build up a set of IED terms over time and want to the Stanford pipeline to extract the terms when found in text files.
Looking to see if this is something someone has done before
Upvotes: 1
Views: 272
Reputation: 8739
Could you specify the tags you want to apply?
To use the RegexNER all you have to do is build a file with 1 entry per line of the form:
TEXT_PATTERN\tTAG
You would put all of the things you want in your custom dictionary into a file, say custom_dictionary.txt
I am assuming by IED you mean
https://en.wikipedia.org/wiki/Improvised_explosive_device ??
So your file might look like:
VBIED\tIED_TERM
sticky bombs\tIED_TERM
RCIED\tIED_TERM
New Country\tLOCATION
New Person\tPERSON
(Note Stack Overflow has some strange formatting, there should not be blank lines between each entry, it should be 1 entry per line!!)
If you then run this command:
java -mx1g -cp '*' edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators 'tokenize,ssplit,pos,lemma,regexner,ner' -file sample_input.txt -regexner.mapping custom_dictionary.txt
you will tag sample_input.txt
Updating is merely a matter of updating custom_dictionary.txt
One thing to be on the lookout for, it matters if you put "ner" first or "regexner" first in your list of annotators.
If your highest priority is tagging with your specialized terms (for instance IED_TERM), I would run the regexner first in the pipeline, since there are some tricky issues with how the taggers overwrite each other.
Upvotes: 1
Reputation: 364
Please take a look at http://nlp.stanford.edu/software/regexner/ to see how to use it. It allows you to specify a file of mappings of phrases to entity types. When you want to update the mappings, you update the file and rerun the Stanford pipeline.
If you are interested in how to actually learn patterns for the terms over time, you can take a look at our pattern learning system: http://nlp.stanford.edu/software/patternslearning.shtml
Upvotes: 1