Apollo
Apollo

Reputation: 9064

Allowing escape sequences in my regular expression

I'm trying to create a regular expression which finds occurences of $VAR or ${VAR}. If something like \$VAR or \${VAR} was given, it would not match. If it were given something like \\$VAR or \\${VAR} or any multiple of 2 \'s, it should match.

i.e.
$BLOB matches
\$BLOB doesn't match
\\$BLOB matches
\\\$BLOB doesn't match
\\\\$BLOB matches
... etc

I'm currently using the following regex:

    line = re.sub("[^\\][\\\\]*\$(\w[^-]+)|"
                  "[^\\][\\\\]*\$\{(\w[^-]+)\}",replace,line)

However, this doesn't work properly. When I give it \$BLOB, it still matches for some reason. Why is this?

Upvotes: 1

Views: 52

Answers (2)

jfs
jfs

Reputation: 414865

To write a regular expression that finds $ unless it is escaped using E unless it in turn is also escaped EE:

import re

values = dict(BLOB='some value')
def repl(m):
    return m.group('before') + values[m.group('name').strip('{}')]

regex = r"(?<!E)(?P<before>(?:EE)*)\$(?P<name>N|\{N\})"
regex = regex.replace('E', re.escape('\\'))
regex = regex.replace('N', r'\w+') # name
line = re.sub(regex, repl, line)

Using E instead of '\\\\' exposes your embed language without thinking about backslashes in Python string literals and regular expression patterns.

Upvotes: 0

Pi Marillion
Pi Marillion

Reputation: 4674

The second groupings of double slashes are written as a redundant character class [\\\\]*, matching one or more backslashes, but should be a repeating group ((?:\\\\)*) matching one or more sets of two backslashes:

 re.sub(r'(?<!\\)((?:\\\\)*)\$(\w[^-]+|\{(\w[^-]+)\})',r'\1' + replace, line)

Upvotes: 1

Related Questions