user2067030
user2067030

Reputation: 764

spacy matcher dealing with overlapping matches

I am new to spacy and trying to experiment with the Matcher. What I do not know is how to make the matcher pick one match when overlaps. I want to be able to match both brain and tumor because there may be other types of tumor. But I don't know that once it finds both matches to pick one.I tried playing with the callback functions but cannot figure out from the examples how to make it work.

doc = nlp("brain tumor resection")

pattern1 = [{'LOWER':'brain'}, [{'LOWER':'tumor'}]
pattern2 = [[{'LOWER':'tumor'}]

matcher.add("tumor", None, pattern1, pattern2)

phrase_matches = matcher(doc)

this gives me (0,2, Brain Tumor) and (1,2, Tumor)

Desired output is: just to pick one in this case brain tumor. but also not sure how to adapt this if in other cases you find spine tumor. How do you add logic and then make the final output pick one based on whatever expert needs.

Upvotes: 1

Views: 949

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627128

You need to fix the syntax a bit (remove the redundant [ in the pattern definitions) and use spacy.util.filter_spans to get the final matches.

See a code demo:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)

doc = nlp("brain tumor resection")
pattern1 = [{'LOWER':'brain'}, {'LOWER':'tumor'}]
pattern2 = [{'LOWER':'tumor'}]
matcher.add("tumor", None, pattern1, pattern2)

spans = [doc[start:end] for _, start, end in matcher(doc)]
for span in spacy.util.filter_spans(spans):
    print((span.start, span.end, span.text))

Output: (0, 2, 'brain tumor').

Upvotes: 1

Related Questions