Reputation: 399
I need to match all strings that contain one word of a list, but only if that word is not immediately preceded by another specific word. I have this regex:
.*(?<!forbidden)\b(word1|word2|word3)\b.*
that is still matching a sentence like hello forbidden word1
because forbidden
is matched by .*
. But if I remove the .*
I am not anymore matching strings like hello word1
, which I want to match.
Note that I want to match a string like forbidden hello word1
.
Could you suggest me how to fix this problem?
Upvotes: 3
Views: 1289
Reputation: 18490
Have a look into word boundaries \bword
can never touch a word character to the left.
To disallow (word1|word2|word3)
if not preceded by forbidden
and
one \W
(non word character)
^.*?\b(?<!forbidden\W)(word1|word2|word3)\b.*
multiple \W
Lookbehinds need to be of fixed length in Python regex. To get around this, an idea is to use \W*
outside preceded by (?<!\W)
for setting the position to look behind.
^.*?(?<!forbidden)(?<!\W)\W*\b(word1|word2|word3)\b.*
Regex101 demo (in multiline demo I used [^\w\n]
instead \W
for not skipping over lines)
Certainly variable-width lookbehind, such as (?<!forbidden\W+)
would be more comfortable. PyPI Regex > import regex AS re
supports lookbehind of variable length: See this demo
Note: If you do not capture anything, a (?:
non-capturing groups can be used as well.
Upvotes: 3
Reputation: 328
If what you want is match entire string. Try this:
^(.(?<!forbidden (word1|word2|word3)\b))*((?<!forbidden )\b(word1|word2|word3)\b)+(.(?<!forbidden (word1|word2|word3)\b))*$
The knowledge is from this thread Regular expression to match a line that doesn't contain a word
I've just reversed the order of look-around
^(.(?<!forbidden (word1|word2|word3)\b))*
to discard any string that has pattern forbidden (word1|word2|word3)
((?<!forbidden )\b(word1|word2|word3)\b)
is what you defined
But I just can't understand why do you need this requirement.
Upvotes: 0
Reputation: 9
This one seems to work well :
^.*\b(?!(?:forbidden|word[1-3])\b)\w+ (word[1-3]).*$
\b(?!(?:forbidden|word[1-3])\b)\w+
checks for multiple following words that are not forbidden
or word[1-3]
.
So it matches hi forbidden hello word1 test
but not hi hello forbidden word2 test
.
Upvotes: 0