DragonXDoom
DragonXDoom

Reputation: 151

Lookarounds in python

I have a problem regarding lookarounds in Python:

>>> spacereplace = re.compile(b'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I)
>>> q = "a b (c or d)"    
>>> q = spacereplace.sub(" and ", q)
>>> q
# What is meant to happen:
'a and b and (c or d)'

# What instead happens
'a and b and (c and or and d)'

The regex is supposed to match any space which is not next to the words "and" or "or", but this doesn't seem to be working.

Can anyone help me with this?

EDIT: In response to a commentor, I broken down the regex into multiple lines.

(?<!\band) # Looks behind the \s, matching if there isn't a word break, followed by "and", there.
(?<!\bor)  # Looks behind the \s, matching if there isn't a word break, followed by "or", there.
\s         # Matches a single whitespace character.
(?!or\b)   # Looks after the \s, matching if there isn't the word "or", followed by a word break there.
(?!and\b)  # Looks after the \s, matching if there isn't the word "and", followed by a word break there.

Upvotes: 3

Views: 113

Answers (1)

ovgolovin
ovgolovin

Reputation: 13410

You presumably confused raw string modifier r with b.

>>> import re
>>> spacereplace = re.compile(r'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I)
>>> q = "a b (c or d)"
>>> spacereplace.sub(" and ", q)
'a and b and (c or d)' 

Sometimes, if regexp doesn't work, it may help to DEBUG it with re.DEBUG flag. In this case by doing that you may notice, that word boundary \b is not detected, which may give a hint where to search for mistake:

>>> spacereplace = re.compile(b'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I | re.DEBUG)
assert_not -1
  literal 8
  literal 97
  literal 110
  literal 100
assert_not -1
  literal 8
  literal 111
  literal 114
in
  category category_space
assert_not 1
  literal 111
  literal 114
  literal 8
assert_not 1
  literal 97
  literal 110
  literal 100
  literal 8


>>> spacereplace = re.compile(r'(?<!\band)(?<!\bor)\s(?!or\b)(?!and\b)', re.I | re.DEBUG)
assert_not -1
  at at_boundary
  literal 97
  literal 110
  literal 100
assert_not -1
  at at_boundary
  literal 111
  literal 114
in
  category category_space
assert_not 1
  literal 111
  literal 114
  at at_boundary
assert_not 1
  literal 97
  literal 110
  literal 100
  at at_boundary

Upvotes: 2

Related Questions