stellasia
stellasia

Reputation: 5622

Having both NER and RegexNER tags in StanfordCoreNLPServer output?

I am using the StanfordCoreNLPServer to extract some informations from text (such as surfaces, street names)

The street is given by a specifically trained NER model, and the surface by a simple regex via the RegexNER.

Each of them work fine separately but when used together, only the NER result is present in the output, under the ner tag. Why isn't there a regexnertag? Is there a way to also have the RegexNER result?

For information:

Let me know if more details are needed.

Upvotes: 6

Views: 1388

Answers (3)

BenP
BenP

Reputation: 845

Update for coreNLP 3.9.2 server via python:

When using coreNLP 3.9.2 server via python, the regexner can also now be initiated as part of ner as per the docs. For example:

from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')

properties={"annotators":"tokenize,ssplit,pos,lemma,ner,coref,openie",
            "outputFormat": "json",
            "ner.fine.regexner.mapping":"rules.txt",}

output = nlp.annotate(text,properties=properties)

I could not get regexner annotator to work by calling it directly. I think this is due to reloading of dependencies and or the method used to translate outputs to JSON e.g. this issue

Upvotes: 0

Emre Colak
Emre Colak

Reputation: 814

Here's what the RegexNER documentation says about this:

RegexNER will not overwrite an existing entity assignment, unless you give it permission in a third tab-separated column, which contains a comma-separated list of entity types that can be overwritten. Only the non-entity O label can always be overwritten, but you can specify extra entity tags which can always be overwritten as well.

Bachelor of (Arts|Laws|Science|Engineering|Divinity) DEGREE

Lalor LOCATION PERSON

Labor ORGANIZATION

I'm not sure what your mapping file exactly looks like, but if it just maps entities to labels, then the original NER will label your entities as NUMBER, and RegexNER won't be able to overwrite them. If you explicitly declare that some NUMBER entities should be overwritten as SURFACE in your mapping file, then it should work.

Upvotes: 4

stellasia
stellasia

Reputation: 5622

Ok, things seem to work as I want if I put the regexner first:

"annotators":"regexner,tokenize,ssplit,pos,ner",

seems there is an ordering problem at some stage of the process?

Upvotes: 3

Related Questions