Samual
Samual

Reputation: 552

SpaCy Rule Based Phrase Matching for Hello World

I am doing ruled based phrase matching in Spacy. I am trying the following example but it is not working.

Example

import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_sm')
doc = nlp('Hello world!')

pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True}, {"LOWER": "world"}]

matcher = Matcher(nlp.vocab)
matcher.add('HelloWorld', None, pattern)

matches = matcher(doc)
print(matches) 

then final matches is giving empty string. Would you please correct me?

Upvotes: 4

Views: 276

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

To match either hello world and also hello, world, you may use

pattern = [{"LOWER": "hello"}, {"IS_PUNCT": True, "OP" : "?"}, {"LOWER": "world"}]

The {"IS_PUNCT": True, "OP" : "?"} means that the token of type punctuation can exist 1 or 0 times (due to "OP" : "?") between hello and world.

See more about Operators and quantifiers in Spacy documentation.

Upvotes: 2

aab
aab

Reputation: 11474

Your pattern matches Hello, world with a punctuation token in the middle, not Hello world

Upvotes: 2

Related Questions