Reputation: 40878
I have a verbose (re.X
) flagged regex that is throwing an exception, even though it seems to be equivalent to its condensed version. (I built the former from the latter.)
Condensed version:
import re
test = 'catdog'
test2 = 'dogcat'
pat = re.compile(r'(?=\b\w{6}\b)\b\w*cat\w*\b')
print(pat.search(test))
print(pat.search(test2))
# catdog Match object
# dogcat Match object
Verbose version:
pat = re.compile(r"""( # Start of group (lookahead); need raw string
?= # Positive lookahead; notation = `q(?=u)`
\b\w{6}\b # Word boundary and 6 alphanumeric characters
) # End of group (lookahead)
\b\w*cat\w*\b # Literal 'cat' in between 0 or more alphanumeric""", re.X)
print(pat.search(test).string)
print(pat.search(test2).string)
# Throws exception
# error: nothing to repeat at position 83 (line 2, column 22)
What's causing this? I can't find why the expanded version is violating any condition for re.X
/re.VERBOSE
. From docs:
This flag allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments. Whitespace within the pattern is ignored, except when in a character class or when preceded by an unescaped backslash. When a line contains a # that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.
There are no character classes or whitespace preceded by unescaped backslashes, as far as I can tell.
Upvotes: 2
Views: 227
Reputation: 280788
This is Python issue 15606. re
's behavior with whitespace inside a token in verbose mode doesn't match the documentation. You can't put whitespace in the middle of (?=
.
Upvotes: 3
Reputation: 33397
The issue is with ?=
on the second line.
The ?
can mean multiple things like [ ]?
which is 0 or 1 spaces which I believe is the case for the whitespace preceding it. Whitespace is ignored but it is making the two chars (
and ?
into separated entities.
Move the ?=
to the 1st line and it will work. Like (?=
The error
error: nothing to repeat at position 83
Makes it pretty clear that ?
is here being interpreted as repetition
Upvotes: 2