Training caseless NER models with Stanford corenlp

Question

I know how to train an NER model as specified here and have a very successful one in fact. I also know about the 3 provided caseless models as talked about here. But what if I want to train my own caseless model, what is the trick there? I have a bunch of all uppercase documents for training. Do I use the same training process or are there special/different features for the caseless models or are there properties that need to be set? I can't find a description as to how the provided caseless models were created.

Christopher Manning · Accepted Answer

There is only one property change in our models, which is that you want to have it invoke a function that removes case information before words are processed for classification. We do that with this property value (which also maps some words to American spelling):

wordFunction = edu.stanford.nlp.process.LowercaseAndAmericanizeFunction

but there is also simply:

wordFunction = edu.stanford.nlp.process.LowercaseFunction

Having more automatic stuff for deciding document format (hard/soft line breaks), case, or even language would be nice, but at present we don't have any of those....

Training caseless NER models with Stanford corenlp

Answers (1)

Related Questions