Reputation: 552
I am doing ruled based phrase matching in Spacy. I am trying the following example but it is not working.
Example
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
doc = nlp('Hello world!')
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]
matcher = Matcher(nlp.vocab)
matcher.add('HelloWorld', None, pattern)
matches = matcher(doc)
print(matches)
then final matches
is giving empty string. Would you please correct me?
Upvotes: 4
Views: 276
Reputation: 626802
To match either hello world
and also hello, world
, you may use
pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True, "OP" : "?"}, {"LOWER": "world"}]
The {"IS_PUNCT": True, "OP" : "?"}
means that the token of type punctuation can exist 1 or 0 times (due to "OP" : "?"
) between hello
and world
.
See more about Operators and quantifiers in Spacy documentation.
Upvotes: 2
Reputation: 11474
Your pattern matches Hello, world
with a punctuation token in the middle, not Hello world
Upvotes: 2