Reputation: 73
my question is i took the code from spacy documentation which is
def on_match(matcher, doc, id, matches):
print("Matched!", matches)
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
patterns = [
[{"LOWER": "hello"}, {"LOWER": "world"}],
[{"ORTH": "Google"}, {"ORTH": "Maps"}],
]
matcher.add("TEST_PATTERNS", patterns, on_match=on_match)
doc = nlp("HELLO WORLD on Google Maps.")
matches = matcher(doc)
How can i merge the these patterns to match only something like "HELLO WORLD ... Google Maps". Thanks a lot.
Upvotes: 1
Views: 380
Reputation: 627128
If you add
for match_id, start, end in matches:
print(doc[start:end].text)
The output will be
HELLO WORLD
Google Maps
Thus, you have two matches and they occur because
[{"LOWER": "hello"}, {"LOWER": "world"}]
- a pattern that returns two consecutive tokens whose lowercased value is hello
and world
(so, it finds HELLO WORLD
alright)[{"ORTH": "Google"}, {"ORTH": "Maps"}]
- a pattern that returns two consecutive tokens whose value is exactly Google
and Maps
(so, it finds Google Maps
alright).Upvotes: 1