Reputation: 1
I want to use NER(CRF classifier) to identify Author names in a query. I trained NER following the method given in nlp.stanford.edu site using the training file:training-data.col. And tested using the file:testing-data.tsv.
The NER is tagging every input as Author, even the data that is tagged as non-Author in the training data. Can anyone tell me why NER is tagging the non-Authors in training data as Authors and how to train NER to identify Authors(I have the list of Author names to train).
Any suggestions for reference material on NER other than nlp.stanford.edu site will be helpful.
Upvotes: 0
Views: 396
Reputation: 619
That's a very small piece of training data, so I'm not surprised that it made the wrong inferences. Since the only example it has seen of "Atal" is as Author
, it's tagging "Atal" as such.
But more so, if you want to discriminate between people listed at the beginning as Author
and people listed in the text as 0
, Stanford NER is not going to do that. Stanford NER is intended to make long distance inferences about the named-entity tags of tokens in natural language text. In other words, it's doing the opposite of what you're trying to do.
You could probably do this with some simple pattern recognition---if your documents are formatted in a similar way, with the authors together, I would start with exploiting that. You could use the NER to tag the authors as PERSON
, and then use that tag as a feature in your own tagging.
Upvotes: 1