Laz22434
Laz22434

Reputation: 373

Spacy matcher return Rule patterns

i need help with rule based matcher in spacy. I have this code:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
# Add match ID "HelloWorld" with no callback and one pattern
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
pattern = [{"LOWER": "Good"}, {"IS_PUNCT": True}, {"LOWER": "night"}]

matcher.add("HelloWorld", [pattern])

doc = nlp("Hello, world! Hello world!")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(match_id, string_id, start, end, span.text)

Everything works well I get the match_id,string_id etc... but i'm asking myself if it's possible to get the pattern corresponding to the matched span:

Essentially i want to know if it's possible to get the pattern corresponding to the match in spacy:

For example in my example,

[{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]

is the corresponding match for my example.

Thank you very much

Upvotes: 0

Views: 1048

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626799

With patterns that are all uniquely named, you can use a workaround involving the use of a list of dictionaries where the key is the pattern name, and the value is the actual pattern. Once you obtain a match, you can get the pattern by the pattern name:

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
patterns = [                                                         # Define patterns
    {'HelloWorld': [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]},
    {'GoodNight': [{"LOWER": "good"}, {"LOWER": "night"}]}
]
for p in patterns:                                        # Adding patterns to matcher
    for name,pattern in p.items():
        matcher.add(name, [pattern])
doc = nlp("Hello, world! Hello world! Good night!")
matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(match_id, string_id, start, end, span.text)
    print("The pattern is:", [p for p in patterns if string_id in p][0][string_id])

Output:

15578876784678163569 HelloWorld 0 3 Hello, world
The pattern is: [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
15528765659627300253 GoodNight 7 9 Good night
The pattern is: [{'LOWER': 'good'}, {'LOWER': 'night'}]

Upvotes: 0

polm23
polm23

Reputation: 15593

If multiple patterns are added with the same label you can't find which pattern matched after the fact.

There are a couple of things you can do. A very simple one is to use different labels for each pattern. Another option is to use pattern IDs with the EntityRuler.

Upvotes: 1

Related Questions