W030 Some entities could not be aligned in the text - but why?

Question

Please help me to understand the following Spacy example of non aligned entities:

"text" : "15) Abstract:“The contribution of surface charge was been quantitative determined” -> Correct the grammar."

"labels" : [[13, 82, "LOCATION"], [4, 12, "LOCATION"], [86, 105, "ACTION"]]}

To me it looks al good, the entities are well aligned. Any idea why I am getting the

[W030] Some entities could not be aligned in the text

If i add a space between the semi-colon and double quote after the abstract Abstract:“Theand change the entity numbering accordingly in order to have:

"text" : "15) Abstract: “The contribution of surface charge was been quantitative determined” -> Correct the grammar."

"labels" : [[14, 82, "LOCATION"], [4, 12, "LOCATION"], [87, 106, "ACTION"]]}

Then everything looks ok. I would like to understand why there is such difference.

EDIT:

Here is the code I am trying to use in order to get read of this issue, and it works with infixes.extend((":")), however, why it doesn't work with infixes.extend((":", "“", ",", '“', "/", ";", ".", '”'))

nlp = spacy.blank("en")
nlp.add_pipe("ner")
infixes = list(nlp.Defaults.infixes)
#infixes.extend((":", "“", ",", '“', "/", ";", ".", '”'))
infixes.extend((":"))
infix_regex = spacy.util.compile_infix_regex(infixes)
nlp.tokenizer.infix_finditer = infix_regex.finditer

W030 Some entities could not be aligned in the text - but why?

Answers (1)

Related Questions