Reputation: 9718
I know how to train an NER model as specified here and have a very successful one in fact. I also know about the 3 provided caseless models as talked about here. But what if I want to train my own caseless model, what is the trick there? I have a bunch of all uppercase documents for training. Do I use the same training process or are there special/different features for the caseless models or are there properties that need to be set? I can't find a description as to how the provided caseless models were created.
Upvotes: 0
Views: 513
Reputation: 9450
There is only one property change in our models, which is that you want to have it invoke a function that removes case information before words are processed for classification. We do that with this property value (which also maps some words to American spelling):
wordFunction = edu.stanford.nlp.process.LowercaseAndAmericanizeFunction
but there is also simply:
wordFunction = edu.stanford.nlp.process.LowercaseFunction
Having more automatic stuff for deciding document format (hard/soft line breaks), case, or even language would be nice, but at present we don't have any of those....
Upvotes: 2