is there a method of rule based matching of spacy to match patterns?

Question

i want to use rule based matching i have a text like each word with POS:

 text1= "it_PRON is_AUX a_DET beautiful_ADJ  apple_NOUN"

 text2= "it_PRON is_AUX a_DET beautiful_ADJ and_CCONJ big_ADJ apple_NOUN"

so i want to create a rule based matching that extract if we have an ADJ followed by noun (NOUN) or an ADJ followed by (PUNCT or CCONJ) followed by an ADJ followed by a noun (NOUN)

so, iwant to have in output :

text1 = [beautiful_ADJ  apple_NOUN]
text2= [beautiful_ADJ and_CCONJ big_ADJ apple_NOUN]

i tried to do this but i didn't find the right pattern that allows to do this :

from spacy.matcher import Matcher,PhraseMatcher
import spacy
import spacy
from spacy.matcher import Matcher

matchers = {"first_processing": Matcher(nlp.vocab, validate=True)}
nlp = spacy.load("en_core_web_sm")
pattern = [{},{},{}]  #################################### we must find the right pattern
matchers["first_processing"].add("process_1", None, pattern)

nlp = spacy.load("en_core_web_sm")
doc = nlp("it_PRON is_AUX a_DET beautiful_ADJ and_CCONJ big_ADJ apple_NOUN")
a=matcher(doc)
for match_id, start, end in a:
    text = doc[start:end].text
    print(text)

Wiktor Stribiżew · Accepted Answer

I understand it that you have texts = ["it is a beautiful apple", "it is a beautiful and big apple"], and plan to define a couple of Matcher patterns to extract certain POS patterns in the texts you have.

You may define a list of lists with desired patterns, and pass as the third+ argument to matcher.add:

from spacy.matcher import Matcher,PhraseMatcher
import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab,validate=True)
patterns = [
    [{'POS': 'ADJ'}, {'POS': 'NOUN'}],
    [{'POS': 'ADJ'}, {'POS': 'CCONJ'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}],
    [{'POS': 'ADJ'}, {'POS': 'PUNCT'}, {'POS': 'ADJ'}, {'POS': 'NOUN'}]
]
matcher.add("process_1", None, *patterns)

texts= ["it is a beautiful apple", "it is a beautiful and big apple"]
for text in texts:
    doc = nlp(text)
    matches = matcher(doc)
    for _, start, end in matches:
        print(doc[start:end].text)
   
# => beautiful apple
#    beautiful and big apple
#    big apple

is there a method of rule based matching of spacy to match patterns?

Answers (2)

Related Questions