Nemo
Nemo

Reputation: 1227

Custom entity ruler with SpaCy did not return a match

This link shows how to create custom entity ruler.

I basically copied and modified the code for another custom entity ruler and used it to find a match in a doc as follows:

nlp = spacy.load('en_core_web_lg')
ruler = EntityRuler(nlp)

grades = ["Level 1", "Level 2", "Level 3", "Level 4"]
for item in grades:
    ruler.add_patterns([{"label": "LEVEL", "pattern": item}])

nlp.add_pipe(ruler)

doc = nlp('Level 2 employee first 12 months 1032.70')

with doc.retokenize() as retokenizer:
    for ent in doc.ents:
        retokenizer.merge(doc[ent.start:ent.end])

matcher = Matcher(nlp.vocab)
pattern =[{'ENT_TYPE': {'REGEX': 'LEVEL'}}, {'ORTH': 'employee'}]
matcher.add('PAY_LEVEL', None, pattern)
matches = matcher(doc)

for match_id, start, end in matches:
    span = doc[start:end]
    print(span)

However, when I run the code (in Jupyter notebook), nothing returned.

Could you please tell me:

  1. If the code returned nothing, did it mean no match was found?

  2. Why couldn't my code find a match although it's almost identical to the original (except for the patterns added to the ruler)? What did I do wrong?

Thank you.

Upvotes: 3

Views: 3947

Answers (1)

aab
aab

Reputation: 11484

The problem is an interaction between the NER component provided in the English model and your EntityRuler component. The NER component finds 2 as a number (CARDINAL) and there's a restriction that entities aren't allowed to overlap, so the EntityRuler component doesn't find any matches.

You can either add your EntityRuler before the NER component:

nlp.add_pipe(ruler, before='ner')

Or tell the EntityRuler that it's allowed to overwrite existing entities:

ruler = EntityRuler(nlp, overwrite_ents=True)

Upvotes: 9

Related Questions