Michael Bewin
Michael Bewin

Reputation: 23

Spacy - adding multiple patterns to a single NER using entity ruler

so this is my problem in Spacy Rule based matching.

I have a txt group say

text = ('Wan, Flex, Havelock St, WAN, premium, Fibre, 15a, UK, Fletcher inc, Fletcher, Princeton Street, Fendalton road, Bealey avenue)

doc = nlp3(text)

for ent in doc.ents:
print(ent, '|', ent.label_)

#This provides me a result where : Wan, WAN are classified as persons and Fibre as an ORG

#Now when I build my custom pattern using entity ruler

**nlp3 = spacy.load("en_core_web_sm")

ruler = nlp3.add_pipe("entity_ruler", before="ner")**

#List of Entities and Patterns

patterns = [{"label": "PRODUCT", "pattern": [{"LOWER": "wan"}, {"LOWER": "fibre"}, {"LOWER": "flex"},{"LOWER": "premium"},{"LOWER": "standard"},{"LOWER": "service"}]}]

ruler.add_patterns(patterns)

nlp3.pipe_names

Even after this when I run I get Wan classified as person (while I wish to see WAN, wan, Fibre classified as Product). What am I doing wrong in adding patterns here. And is there a way I can add multiple patterns in a single dictionary to a label. Any help in this regard is appreciated.

Upvotes: 1

Views: 1457

Answers (1)

polm23
polm23

Reputation: 15623

Each pattern you add to the Ruler is one sequence of tokens. So you aren't matching each of those terms individually, you're matching all of them in a row, without punctuation. You should add them as separate patterns, something like this:

words = ("wan", "fibre", ...)
patterns = []
for word in words:
    patterns.append({"label":"PRODUCT", "pattern":[{"LOWER":word}]})

Couple of other things:

  • you may need to set overwrite_ents = True to get the results you want, see here.
  • if your actual input looks like "Wan, Flex, Havelock St, WAN, premium, ...", that's not the normal prose the spaCy models were trained on, and they may not work very well.

Upvotes: 1

Related Questions