Reputation: 764
I am new to spacy and trying to experiment with the Matcher. What I do not know is how to make the matcher pick one match when overlaps. I want to be able to match both brain and tumor because there may be other types of tumor. But I don't know that once it finds both matches to pick one.I tried playing with the callback functions but cannot figure out from the examples how to make it work.
doc = nlp("brain tumor resection")
pattern1 = [{'LOWER':'brain'}, [{'LOWER':'tumor'}]
pattern2 = [[{'LOWER':'tumor'}]
matcher.add("tumor", None, pattern1, pattern2)
phrase_matches = matcher(doc)
this gives me (0,2, Brain Tumor) and (1,2, Tumor)
Desired output is: just to pick one in this case brain tumor. but also not sure how to adapt this if in other cases you find spine tumor. How do you add logic and then make the final output pick one based on whatever expert needs.
Upvotes: 1
Views: 949
Reputation: 627128
You need to fix the syntax a bit (remove the redundant [
in the pattern definitions) and use spacy.util.filter_spans
to get the final matches.
See a code demo:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
doc = nlp("brain tumor resection")
pattern1 = [{'LOWER':'brain'}, {'LOWER':'tumor'}]
pattern2 = [{'LOWER':'tumor'}]
matcher.add("tumor", None, pattern1, pattern2)
spans = [doc[start:end] for _, start, end in matcher(doc)]
for span in spacy.util.filter_spans(spans):
print((span.start, span.end, span.text))
Output: (0, 2, 'brain tumor')
.
Upvotes: 1