Python Regex: negative lookbehind not directly before target word

Question

I am building an NLP baseline script in Jupyter Notebook that should filter out all 'embolisms' from reports. However, when the word 'no' or 'not' occur in the same line/sentence, I do not want them included. This is easy with regex, once you know where the word will occur, if it occurs. But there can be many words in between.

Example: The scan has shown an embolism present; should be included
Example: No embolism has been found; should be excluded (this is easy with Regex)
Problem example: Currently no developing, interesting, nice, beautiful embolism has been found; should be excluded, but I have no idea how.

This is the regex for excluding the 'no embolism' when they are together in the sentence:

result = re.findall('(?


The error occurring with regular regex when extending to multiple words is: "error: look-behind requires fixed-width pattern"
I have googled on how to solve it, but I did not find a solution applicable to this problem. I did also find that installing Regex with pip removes the aforementioned error. However, I'm still wondering whether there is a solution for this problem?
Best,

The fourth bird · Accepted Answer

You can exclude the last 2 by matching them, and capture the first example that you want to keep in a group.

^(?:.*\bnot?\b.*\bembolism\b.*|.*\bembolism\b.*\bnot?\b.*)|(.*\bembolism\b.*)$

Explanation

^ Start of string
(?: Non capture group
- .*\bnot?\b.*\bembolism\b.* Match first no or not followed by embolism
- | Or
- .*\bembolism\b.*\bnot?\b.* Match it the other way around
) Close non capture group
| Or
(.*\bembolism\b.*) Capture group 1 (what you want to keep) containing embolism
$ End of string

Regex demo

Python Regex: negative lookbehind not directly before target word

Answers (1)

Related Questions