print words between two particular words in a given string

Question

if one particular word does not end with another particular word, leave it. here is my string:

x = 'john got shot dead. john with his .... ? , john got killed or died in 1990. john with his wife dead or died'

i want to print and count all words between john and dead or death or died. if john does not end with any of the died or dead or death words. leave it. start again with john word.

my code :

x = re.sub(r'[^\w]', ' ', x)  # removed all dots, commas, special symbols

for i in re.findall(r'(?<=john)' + '(.*?)' + '(?=dead|died|death)', x):
    print i
    print len([word for word in i.split()])

my output:

 got shot 
2
 with his          john got killed or 
6
 with his wife 
3

output which i want:

got shot
2
got killed or
3
with his wife
3

i don't know where i am doing mistake. it is just a sample input. i have to check with 20,000 inputs at a time.

anubhava · Accepted Answer

You can use this negative lookahead regex:

>>> for i in re.findall(r'(?<=john)(?:(?!john).)*?(?=dead|died|death)', x):
...     print i.strip()
...     print len([word for word in i.split()])
...

got shot
2
got killed or
3
with his wife
3

Instead of your .*? this regex is using (?:(?!john).)*? which will lazily match 0 or more of any characters only when john is not present in this match.

I also suggest using word boundaries to make it match complete words:

re.findall(r'(?<=\bjohn\b)(?:(?!\bjohn\b).)*?(?=\b(?:dead|died|death)\b)', x)

Code Demo

print words between two particular words in a given string

Answers (2)

Related Questions