Reputation: 151
I have a problem regarding lookarounds in Python:
>>> spacereplace = re.compile(b'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I)
>>> q = "a b (c or d)"
>>> q = spacereplace.sub(" and ", q)
>>> q
# What is meant to happen:
'a and b and (c or d)'
# What instead happens
'a and b and (c and or and d)'
The regex is supposed to match any space which is not next to the words "and" or "or", but this doesn't seem to be working.
Can anyone help me with this?
EDIT: In response to a commentor, I broken down the regex into multiple lines.
(?<!\band) # Looks behind the \s, matching if there isn't a word break, followed by "and", there.
(?<!\bor) # Looks behind the \s, matching if there isn't a word break, followed by "or", there.
\s # Matches a single whitespace character.
(?!or\b) # Looks after the \s, matching if there isn't the word "or", followed by a word break there.
(?!and\b) # Looks after the \s, matching if there isn't the word "and", followed by a word break there.
Upvotes: 3
Views: 113
Reputation: 13410
You presumably confused raw string modifier r
with b
.
>>> import re
>>> spacereplace = re.compile(r'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I)
>>> q = "a b (c or d)"
>>> spacereplace.sub(" and ", q)
'a and b and (c or d)'
Sometimes, if regexp doesn't work, it may help to DEBUG
it with re.DEBUG
flag. In this case by doing that you may notice, that word boundary \b
is not detected, which may give a hint where to search for mistake:
>>> spacereplace = re.compile(b'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I | re.DEBUG)
assert_not -1
literal 8
literal 97
literal 110
literal 100
assert_not -1
literal 8
literal 111
literal 114
in
category category_space
assert_not 1
literal 111
literal 114
literal 8
assert_not 1
literal 97
literal 110
literal 100
literal 8
>>> spacereplace = re.compile(r'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I | re.DEBUG)
assert_not -1
at at_boundary
literal 97
literal 110
literal 100
assert_not -1
at at_boundary
literal 111
literal 114
in
category category_space
assert_not 1
literal 111
literal 114
at at_boundary
assert_not 1
literal 97
literal 110
literal 100
at at_boundary
Upvotes: 2