Reputation: 1
I am using Stanford's Deepdive project to a annotate a huge list of public complaints on specific vehicles. My project is to use the problem descriptions and to teach Deepdive to learn how to categorize the problems based on the words in their sentences. For example, if a customer stated something like the "airbag malfunctioned", then deepdive should be able to tell that this is a safety issue and they are talking about a part of the car. So what I am trying to do is update Stanford's CoreNLP Named Entity Recognition(NER) list to start finding words like these as well and label them things such as "CAR SAFETY ISSUE". Could anybody explain in depth how to go about adding a new annotator so CoreNLP could analyze these sentences based on cars parts and general issues. Thank You
Upvotes: 0
Views: 542
Reputation: 5749
@Blaise is correct that this sounds like a good fit for TokensRegex. However, if you do want to create a custom annotator, the process is laid out at: http://nlp.stanford.edu/software/corenlp-faq.shtml#custom .
At a high level, you want to create a class inheriting from Annotator
and implementing a 2-argument constructor MyClass(String name, Properties props)
. Then, in your properties file you pass into CoreNLP, you should specify customAnnotatorClass.your_annotator_name = your.annotator.Class
. You can pass properties to this annotator in the usual way, by specifying your_annotator_name.key = value
.
Upvotes: 1
Reputation: 43
Did you look over the TokenRegexAnnotator ? With rules you can extract such expressions and annotate tokens with a custom NER tag :
{
ruleType: "tokens",
pattern: (/airbag/ /malfunctioned/),
result: Annotate($0, ner, 'CAR SAFETY ISSUE')
}
Upvotes: 1