Valdegg
Valdegg

Reputation: 19

Rule-based NER in Spacy: Remove patterns

I have been adding rules to my custom Spacy named entity recognition model, using the new EntityRuler ( https://spacy.io/usage/rule-based-matching#entityruler ).

I added 1 million names of proteins, which took hours to run, and now realized that many of them have names which are common words (like 'FOR' and '11').

I would like to remove some of the patterns from the EntityRuler object ( https://spacy.io/api/entityruler ). But I'm not sure how to do that...

How can I remove rules/patterns from my EntityRuler object? Without unloading everything and loading the ones that should remain.

Upvotes: 1

Views: 1479

Answers (1)

Niels
Niels

Reputation: 1491

Looking at the source code (https://github.com/explosion/spaCy/blob/master/spacy/pipeline/entityruler.py), the EntityRuler object passes the patterns directly to the Matcher object. You can easily access the Matcher object of the EntityRuler as follows (assuming your EntityRuler object is called entity_ruler):

matcher = entity_ruler.matcher

The Matcher object has a method to remove patterns/rules, as explained in the API (https://spacy.io/api/matcher). So you can remove patterns by typing

matcher.remove(<insert pattern ID here>)

Upvotes: 2

Related Questions