Spacy matcher with regex across tokens

Question

I have the following sentences:

phrases = ['children externalize their emotions through outward behavior',
         'children externalize hidden emotions.',
         'children externalize internalized emotions.',
         'a child might externalize a hidden emotion through misbehavior',
         'a kid might externalize some emotions through behavior',
         'traumatized children externalize their hidden trauma through bad behavior.',
         'The kid is externalizing internal traumas',
         'A child might externalize emotions though his outward behavior',
         'The kid externalized a lot of his emotions through misbehavior.']

I want to catch whatever noun comes after the verb externalize; externalizing, externalizes, etc

In this case; we should get:

externalize their emotions
externalize hidden emotions
externalize internalized emotions
externalize a hidden emotion
externalize some emotions
externalize their hidden trauma
externalizing internal traumas
externalized a lot of his emotions

So far I am able to catch only the noun if it comes after the verb externalize

I want to catch the noun; if it happens to be after less than 15 characters. for example: externalize a lot of emotions That should be matched; because ( a lot of his ) is only 14 characters; counting the spaces.

Here is my working which is far from perfect.

import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher =  Matcher(vocab = nlp.vocab)
verb_noun = [{'POS':'VERB'}, {'POS':'NOUN'}]
matcher.add('verb_noun', None, verb_noun)

list_result = []
for phrase in phrases:
    doc = nlp(phrase)
    doc_match = matcher(doc)
    if doc_match:
        for match in doc_match:
            start = match[1]
            end = match[2]
            result = doc[start:end]
            result = [i.lemma_ for i in result]
            if 'externaliz' in result[0].lower():
                result = ' '.join(result)
                list_result.append(result)

Spacy matcher with regex across tokens

Answers (1)

Related Questions