Reputation: 4921
I'm trying to build a regex, which would let me check if some word is not preceded by some other word.
I'm using negative lookbehind, but the problem is that there could be other words in between. Here is my test string:
very pure bright and nice
I would like to match bright or nice, but only if they're not preceded by very. Here is what I've tried so far:
(?<!very (?=(.{1,20})?(bright)(?=(.{1,20})?(nice))))(nice|bright)
But this always matches the last word.
Is this way possible, or should I consider trying to do it programmatically?
Upvotes: 3
Views: 687
Reputation: 4921
The solution which worked for me was to create two regular expressions: positive and negative. With positive I only check that phrase contains required words and with negative I check that some specific word is followed by them and then negate result of negative search:
# /usr/bin/python
import re
RE_PATTERN = re.compile(r'(bright|nice)')
RE_NEGATIVE_PATTERN = re.compile(r'very(?=.{1,30}(?:bright|nice))')
def match(string):
pos_match = RE_PATTERN.search(string)
neg_match = RE_NEGATIVE_PATTERN.search(string)
matches = (bool(pos_match), not neg_match)
return all(matches)
def test_matched():
for s in [
'bright',
'nice',
'something bright',
'something nice',
'bright and nice',
'nice and bright',
]:
assert match(s), s
def test_not_matched():
for s in [
'very pure bright and nice',
'very good',
'very bright',
'very nice',
'very something nice and bright',
'very something nice',
'very something bright',
]:
assert not match(s), s
def main():
test_matched()
test_not_matched()
if __name__ == '__main__':
main()
Upvotes: 0