demongolem
demongolem

Reputation: 9718

Training caseless NER models with Stanford corenlp

I know how to train an NER model as specified here and have a very successful one in fact. I also know about the 3 provided caseless models as talked about here. But what if I want to train my own caseless model, what is the trick there? I have a bunch of all uppercase documents for training. Do I use the same training process or are there special/different features for the caseless models or are there properties that need to be set? I can't find a description as to how the provided caseless models were created.

Upvotes: 0

Views: 513

Answers (1)

Christopher Manning
Christopher Manning

Reputation: 9450

There is only one property change in our models, which is that you want to have it invoke a function that removes case information before words are processed for classification. We do that with this property value (which also maps some words to American spelling):

wordFunction = edu.stanford.nlp.process.LowercaseAndAmericanizeFunction

but there is also simply:

wordFunction = edu.stanford.nlp.process.LowercaseFunction

Having more automatic stuff for deciding document format (hard/soft line breaks), case, or even language would be nice, but at present we don't have any of those....

Upvotes: 2

Related Questions