chaz
chaz

Reputation: 608

re.sub tries to escape repl string?

So this doesn't work with python's regex:

>>> re.sub('oof', 'bar\\', 'foooof')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "C:\Python27\lib\re.py", line 270, in _subx
    template = _compile_repl(template, pattern)
  File "C:\Python27\lib\re.py", line 257, in _compile_repl
    raise error, v # invalid expression
error: bogus escape (end of line)

I thought my eyes were deceiving me, so I did this:

>>> re.sub('oof', "bar\x5c", 'foooof')

Got the same thing. I've searched and have confirmed people have this problem. So what's the problem with treating repl as just an ordinary string? Are there additional formatting options that can be in placed in repl?

Upvotes: 2

Views: 6625

Answers (4)

dawg
dawg

Reputation: 104032

If you don't want the string escapes to be processed, you can use a lambda and the string is not processed:

>>> re.sub('oof', lambda x: 'bar\\', 'foooof')
'foobar\\'
>>> s=re.sub('oof', lambda x: 'bar\\', 'foooof')
>>> print s
foobar\

But it will still be interpreted when printed:

>>> re.sub('oof', lambda x: 'bar\r\\', 'foooof')
'foobar\r\\'
>>> print re.sub('oof', lambda x: 'bar\r\\', 'foooof')
\oobar

Or, use a raw string:

>>> re.sub('oof', r'bar\\', 'foooof')
'foobar\\'

Upvotes: 4

Kyle Strand
Kyle Strand

Reputation: 16499

Yes, the replacement string is processed for escape characters. From the docs:

repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, \n is converted to a single newline character, \r is converted to a carriage return, and so forth. Unknown escapes such as \j are left alone. Backreferences, such as \6, are replaced with the substring matched by group 6 in the pattern.

Upvotes: 4

smooth reggae
smooth reggae

Reputation: 2219

Did you expect foobar\ as the output? If so, re.sub('oof', r'bar\\', 'foooof') is what you need; the r tells Python to treat what follows as a raw string and thus backslashes are treated as backslashes instead of working as as sign that the following character needs to be treated specially. Here's a section in the documentation that explains this in more detail.

Upvotes: -1

perreal
perreal

Reputation: 98078

Use raw strings:

re.sub('oof', r'bar\\', 'foooof')

without the r prefix, you need to have double escaped backslashes:

re.sub('oof', 'bar\\\\', 'foooof')

Upvotes: 2

Related Questions