Reputation: 19
I have been adding rules to my custom Spacy named entity recognition model, using the new EntityRuler ( https://spacy.io/usage/rule-based-matching#entityruler ).
I added 1 million names of proteins, which took hours to run, and now realized that many of them have names which are common words (like 'FOR' and '11').
I would like to remove some of the patterns from the EntityRuler object ( https://spacy.io/api/entityruler ). But I'm not sure how to do that...
How can I remove rules/patterns from my EntityRuler object? Without unloading everything and loading the ones that should remain.
Upvotes: 1
Views: 1479
Reputation: 1491
Looking at the source code (https://github.com/explosion/spaCy/blob/master/spacy/pipeline/entityruler.py), the EntityRuler object passes the patterns directly to the Matcher object. You can easily access the Matcher object of the EntityRuler as follows (assuming your EntityRuler object is called entity_ruler):
matcher = entity_ruler.matcher
The Matcher object has a method to remove patterns/rules, as explained in the API (https://spacy.io/api/matcher). So you can remove patterns by typing
matcher.remove(<insert pattern ID here>)
Upvotes: 2