Reputation: 373
i need help with rule based matcher in spacy. I have this code:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
# Add match ID "HelloWorld" with no callback and one pattern
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
pattern = [{"LOWER": "Good"}, {"IS_PUNCT": True}, {"LOWER": "night"}]
matcher.add("HelloWorld", [pattern])
doc = nlp("Hello, world! Hello world!")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
Everything works well I get the match_id,string_id etc... but i'm asking myself if it's possible to get the pattern corresponding to the matched span:
Essentially i want to know if it's possible to get the pattern corresponding to the match in spacy:
For example in my example,
[{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
is the corresponding match for my example.
Thank you very much
Upvotes: 0
Views: 1048
Reputation: 626799
With patterns that are all uniquely named, you can use a workaround involving the use of a list of dictionaries where the key is the pattern name, and the value is the actual pattern. Once you obtain a match, you can get the pattern by the pattern name:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
patterns = [ # Define patterns
{'HelloWorld': [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]},
{'GoodNight': [{"LOWER": "good"}, {"LOWER": "night"}]}
]
for p in patterns: # Adding patterns to matcher
for name,pattern in p.items():
matcher.add(name, [pattern])
doc = nlp("Hello, world! Hello world! Good night!")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(match_id, string_id, start, end, span.text)
print("The pattern is:", [p for p in patterns if string_id in p][0][string_id])
Output:
15578876784678163569 HelloWorld 0 3 Hello, world
The pattern is: [{'LOWER': 'hello'}, {'IS_PUNCT': True}, {'LOWER': 'world'}]
15528765659627300253 GoodNight 7 9 Good night
The pattern is: [{'LOWER': 'good'}, {'LOWER': 'night'}]
Upvotes: 0
Reputation: 15593
If multiple patterns are added with the same label you can't find which pattern matched after the fact.
There are a couple of things you can do. A very simple one is to use different labels for each pattern. Another option is to use pattern IDs with the EntityRuler.
Upvotes: 1