Tudor Marghidanu
Tudor Marghidanu

Reputation: 81

Linking multiple name finder entities using OpenNLP

First a little bit of context: I'm trying to identify street addresses in a corpus of documents and we decided that the obvious solution for this would be to use an NLP (Apache OpenNLP in this case) tool to achieve this and so far everything looks great although we still need to train the model with a lot of documents, but that's not really an issue. We improved the solution by adding a extra step for address validation by using the USAddress parser from Datamade. My biggest issue is the fact that the addresses by themselves are nothing without a location next to them, sometimes the location is specified in the text and we will assume that this happens quite often.

Here comes my question: Is there someway to use coreference to associate the entities in the text? Or better yet is there a way to annotate arbitrary words in the text and identify them as being one entity?

I've been looking at the Apache OpenNLP documentation but...it's pretty thin and I think it still needs some work.

Upvotes: 4

Views: 861

Answers (3)

Tudor Marghidanu
Tudor Marghidanu

Reputation: 81

Ok, several months later! It wasn't Coref what I was after... what I as actually looking for was Relation Extraction (Information Extraction). I used MITIE (BinaryRelation) and that did the trick, I trained my own model using Brat annotation tool and I got an F1 score of 0.81. Pretty neat...

Upvotes: 0

iamgr007
iamgr007

Reputation: 986

If you want to use coreference for this problem, you can have a look at this blog

But a simpler solution would be using a sentence detector+ RegEx or a location NER+ sentence detector(presuming addresses are in a single line)

I think the US addresses can be identified using a Regular Expression and once the regex matches, you can use opennlp's sentence detector to print the whole address line.

Similarly you can use NER model provided by opennlp to find locations and print the sentence you want.

Hope this helps!

edit

this Github Repo made it simple for us. Check it out!

Upvotes: 1

Vihari Piratla
Vihari Piratla

Reputation: 9332

OpenNLP does not provide a coreference resolution module.
You have to use either Stanford or Illinois or Berkeley system to accomplish the task. They may not work out of the box, you may have to do some parameter tuning or supervised training to achieve reasonable performance.

@edit
Thanks @Alaye for pointing out that OpenNLP does have a coref module, for more details see his answer.

Thanks

Upvotes: 0

Related Questions