Reputation: 608
So this doesn't work with python's regex:
>>> re.sub('oof', 'bar\\', 'foooof')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\re.py", line 151, in sub
return _compile(pattern, flags).sub(repl, string, count)
File "C:\Python27\lib\re.py", line 270, in _subx
template = _compile_repl(template, pattern)
File "C:\Python27\lib\re.py", line 257, in _compile_repl
raise error, v # invalid expression
error: bogus escape (end of line)
I thought my eyes were deceiving me, so I did this:
>>> re.sub('oof', "bar\x5c", 'foooof')
Got the same thing. I've searched and have confirmed people have this problem. So what's the problem with treating repl as just an ordinary string? Are there additional formatting options that can be in placed in repl?
Upvotes: 2
Views: 6625
Reputation: 104032
If you don't want the string escapes to be processed, you can use a lambda and the string is not processed:
>>> re.sub('oof', lambda x: 'bar\\', 'foooof')
'foobar\\'
>>> s=re.sub('oof', lambda x: 'bar\\', 'foooof')
>>> print s
foobar\
But it will still be interpreted when printed:
>>> re.sub('oof', lambda x: 'bar\r\\', 'foooof')
'foobar\r\\'
>>> print re.sub('oof', lambda x: 'bar\r\\', 'foooof')
\oobar
Or, use a raw string:
>>> re.sub('oof', r'bar\\', 'foooof')
'foobar\\'
Upvotes: 4
Reputation: 16499
Yes, the replacement string is processed for escape characters. From the docs:
repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes such as \j are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern.
Upvotes: 4
Reputation: 2219
Did you expect foobar\
as the output? If so, re.sub('oof', r'bar\\', 'foooof')
is what you need; the r
tells Python to treat what follows as a raw string and thus backslashes are treated as backslashes instead of working as as sign that the following character needs to be treated specially. Here's a section in the documentation that explains this in more detail.
Upvotes: -1
Reputation: 98078
Use raw strings:
re.sub('oof', r'bar\\', 'foooof')
without the r
prefix, you need to have double escaped backslashes:
re.sub('oof', 'bar\\\\', 'foooof')
Upvotes: 2