T.I
T.I

Reputation: 73

spacy matcher on_match called twice

my question is i took the code from spacy documentation which is

def on_match(matcher, doc, id, matches):
    print("Matched!", matches)

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
patterns = [
    [{"LOWER": "hello"}, {"LOWER": "world"}],
    [{"ORTH": "Google"}, {"ORTH": "Maps"}],
]
matcher.add("TEST_PATTERNS", patterns, on_match=on_match)
doc = nlp("HELLO WORLD on Google Maps.")
matches = matcher(doc)

How can i merge the these patterns to match only something like "HELLO WORLD ... Google Maps". Thanks a lot.

Upvotes: 1

Views: 380

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627128

If you add

for match_id, start, end in matches:
    print(doc[start:end].text)

The output will be

HELLO WORLD
Google Maps

Thus, you have two matches and they occur because

  • [{"LOWER": "hello"}, {"LOWER": "world"}] - a pattern that returns two consecutive tokens whose lowercased value is hello and world (so, it finds HELLO WORLD alright)
  • [{"ORTH": "Google"}, {"ORTH": "Maps"}] - a pattern that returns two consecutive tokens whose value is exactly Google and Maps (so, it finds Google Maps alright).

Upvotes: 1

Related Questions