Reputation: 1209
Say that we have an escape character \
which is only allowed to immediately precede another \
. In other words, the escape character \
is only allowed to escape itself: \
. Escaping any other character is considered to be a bad escape.
\foo bad escape at position 0
\\foo ok
\\\foo bad escape at position 2
\\\\foo ok
\\\\\foo bad escape at position 4
I need to identify these bad escape characters, their position, and what they are trying to escape. We can assume that the input text does not contain newlines. Of course, I could iterate over groups of correct escapes until I find a bad one.
line = '\\\\\\'
i = 0
while i < len(line):
curr_char = line[i]
next_char = line[i+1] if i < len(line) - 1 else 'EOL'
if curr_char == '\\':
if next_char == '\\':
i += 2
continue
else:
print(f'bad escape at pos {i}: {next_char}')
break
else:
i += 1
But I need a faster solution than this and that's why I would like to match the bad escape with a regular expression. My first - somewhat naive - approach was to match any backslash immediately succeeded by anything but a backslash: \\([^\\]|$)
.
import re
p = re.compile(r'\\([^\\]|$)')
p.search('\\') # [ok] matches the only backslash
p.search('\\f') # [ok] matches the only backslash
p.search('\\\\') # [err] matches the correctly escaped backslash
p.search('\\\\\\') # [ok] matches the last backslash, which indeed is a bad escape
Ok, so that doesn't work. The next logical thing to do seems to add a negative look-behind expression (?<!\\)
to ignore escaped backslashes.
import re
p = re.compile(r'(?<!\\)\\([^\\]|$)')
p.search('\\') # [ok] matches the only backslash
p.search('\\f') # [ok] matches the only backslash
p.search('\\\\') # [ok] does not match anything
p.search('\\\\\\') # [err] does not match the bad escape (last backslash)
Another thing I could do is to use substitutions and substitute the bad escape with a placeholder, but that seems rather hacky and not super efficient any way... also, this solution screams "there must be a better way!" :-)
import re
def f_sub(match):
value = match.group()
if value == '\\\\':
return value
return '\x00'
# bad escape before "with", before "bad" and at the end of the line
line = 'text\\\\line \\with \\\\\\bad escapes\\'
line = re.sub(r'(\\\\)|(\\([^\\]|$))', f_sub, line)
print(line)
'text\\\\line \x00ith \\\\\x00ad escapes\x00'
Could anybody help me with this? Thanks a lot in advance!
Upvotes: 1
Views: 1573
Reputation: 785531
You may use this regex with lookarounds:
(?<!\\)(?:\\{2})*(\\)(?!\\)
Code:
>>> reg = re.compile(r'(?<!\\)(?:\\{2})*(\\)(?!\\)')
>>> def badEsc(s):
... m = reg.search(s)
... if m:
... print "bad escape at position " + str(m.start(1))
... else:
... print "ok"
...
Testing:
>>> badEsc(r'\foo')
bad escape at position 0
>>> badEsc(r'\\foo')
ok
>>> badEsc(r'\\\foo')
bad escape at position 2
>>> badEsc(r'\\\\foo')
ok
>>> badEsc(r'\\\\\foo')
bad escape at position 4
Upvotes: 1