Reputation: 23
so this is my problem in Spacy Rule based matching.
I have a txt group say
text = ('Wan, Flex, Havelock St, WAN, premium, Fibre, 15a, UK, Fletcher inc, Fletcher, Princeton Street, Fendalton road, Bealey avenue)
doc = nlp3(text)
for ent in doc.ents:
print(ent, '|', ent.label_)
#This provides me a result where : Wan, WAN are classified as persons and Fibre as an ORG
#Now when I build my custom pattern using entity ruler
**nlp3 = spacy.load("en_core_web_sm")
ruler = nlp3.add_pipe("entity_ruler", before="ner")**
#List of Entities and Patterns
patterns = [{"label": "PRODUCT", "pattern": [{"LOWER": "wan"}, {"LOWER": "fibre"}, {"LOWER": "flex"},{"LOWER": "premium"},{"LOWER": "standard"},{"LOWER": "service"}]}]
ruler.add_patterns(patterns)
nlp3.pipe_names
Even after this when I run I get Wan classified as person (while I wish to see WAN, wan, Fibre classified as Product). What am I doing wrong in adding patterns here. And is there a way I can add multiple patterns in a single dictionary to a label. Any help in this regard is appreciated.
Upvotes: 1
Views: 1457
Reputation: 15623
Each pattern you add to the Ruler is one sequence of tokens. So you aren't matching each of those terms individually, you're matching all of them in a row, without punctuation. You should add them as separate patterns, something like this:
words = ("wan", "fibre", ...)
patterns = []
for word in words:
patterns.append({"label":"PRODUCT", "pattern":[{"LOWER":word}]})
Couple of other things:
overwrite_ents = True
to get the results you want, see here.Upvotes: 1