AvinashK
AvinashK

Reputation: 3423

Segmentation of entities in Named Entity Recognition

I have been using the Stanford NER tagger to find the named entities in a document. The problem that I am facing is described below:-

Let the sentence be The film is directed by Ryan Fleck-Anna Boden pair.

Now the NER tagger marks Ryan as one entity, Fleck-Anna as another and Boden as a third entity. The correct marking should be Ryan Fleck as one and Anna Boden as another.

Is this a problem of the NER tagger and if it is then can it be handled?

Upvotes: 1

Views: 321

Answers (2)

dwatson
dwatson

Reputation: 41

How about

  • take your data and run it through Stanford NER or some other NER.
  • look at the results and find all the mistakes
  • correctly tag the incorrect results and feed them back into your NER.
  • lather, rinse, repeat...

This is a sort of manual boosting technique. But your NER probably won't learn too much this way.

In this case it looks like there is a new feature, hyphenated names, the the NER needs to learn about. Why not make up a bunch of hyphenated names, put them in some text, and tag them and train your NER on that?

You should get there by adding more features, more data and training.

Upvotes: 1

Nitin
Nitin

Reputation: 43

Instead of using stanford-coreNLP you could try Apache opeNLP. There is option available to train your model based on your training data. As this model is dependent on the names supplied by you, it able to detect names of your interest.

Upvotes: 0

Related Questions