Extract sentence based on regex conditions in python

Question

I have a dataset containing 9000 sentences from which I need 20/20 statements based upon some conditions. However, when I try to match those conditions either the sentence is outputted or the conditions are not met. The first 20 sentences should contain one verb.

For the second part I would like to have sentences that contain more than 2 verbs.

Right now I have the following code for checking if the amount of verbs is less than 2

import re
import spacy
import en_core_web_md
nlp=en_core_web_md.load()

test = "This sentence has just 1 verb"
test2 = "I have put multiple verbs in this sentence because it is possible and I want it"

doc1 = nlp(test)
doc2 = nlp(test2)

empt = []
for item in doc1.sents:
    verbs = 0
    for token in item:
        if token.pos_ == "VERB":
            verbs += 1
            if verbs < 2:
                empt.append(item)

However, I end up with an empty list.

Can someone tell me what I am doing wrong so i can adjust this code for every additional condition?

tiberius · Accepted Answer

You just need to pull the last two lines back two indentation levels. You only want to check the number of verbs in the entire sentence after all the tokens have been considered.

Extract sentence based on regex conditions in python

Answers (1)

Related Questions