prooffreader
prooffreader

Reputation: 2443

Python re.sub not returning match

In my brain, the following:

>>> re.sub('([eo])', '_\1_', 'aeiou')

should return:

'a_e_i_o_u'

instead it returns:

'a_\x01_i_\x01_u'

I'm sure I'm having a brain cramp, but I can't for the life of me figure out what's wrong.

Upvotes: 1

Views: 416

Answers (2)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

Use raw string r:

re.sub('([eo])', r'_\1_', 'aeiou')

Output:

In [3]: re.sub('([eo])', r'_\1_', 'aeiou')
Out[3]: 'a_e_i_o_u'
In [4]: "\1"
Out[4]: '\x01'   
In [5]: r"\1"
Out[5]: '\\1'

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1121544

\1 produces \x01 in Python string literals. Double the slash, or use a raw string literal:

>>> import re
>>> re.sub('([eo])', '_\1_', 'aeiou')
'a_\x01_i_\x01_u'
>>> re.sub('([eo])', '_\\1_', 'aeiou')
'a_e_i_o_u'
>>> re.sub('([eo])', r'_\1_', 'aeiou')
'a_e_i_o_u'

See The Backslash Plague in the Python regex HOWTO:

As stated earlier, regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.

Upvotes: 4

Related Questions