Reputation: 203
Say, I have a simple string
str = "hello hello hello 123"
In Python, I want to replace all words called "hello"
with "<>"
, I use
re.sub("\bhello\b",'<>',str)
In Ruby 1.8.7 , I use
str.gsub!(/\bhello\b/,'<>')
However, the Ruby Interpreter works as expected changing all WORDS called hello properly. But, Python doesn't - it doesn't even recognize a single word called hello.
My questions are:
Upvotes: 3
Views: 212
Reputation: 52151
You have to make it a raw string as python interprets \b
and <>
differently
>>> s = "hello hello hello 123"
>>> import re
>>> re.sub(r"\bhello\b",r'<>',s)
'<> <> <> 123'*
Note - Never name your string as str
as it over-rides the built in functionality.
Upvotes: 4
Reputation: 1123520
Python strings interpret backslashes as escape codes; \b
is a backspace character. Either double the backslash or use a raw string literal:
re.sub("\\bhello\\b", '<>', inputstring)
or
re.sub(r"\bhello\b", '<>', inputstring)
Compare:
>>> print "\bhello\b"
hello
>>> print r"\bhello\b"
\bhello\b
>>> len("\bhello\b"), len(r"\bhello\b")
(7, 9)
See The Backslash Plague section of the Python regex HOWTO:
As stated earlier, regular expressions use the backslash character (
'\'
) to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.[...]
The solution is to use Python’s raw string notation for regular expressions; backslashes are not handled in any special way in a string literal prefixed with
'r'
, sor"\n"
is a two-character string containing'\'
and'n'
, while"\n"
is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation.
Demo:
>>> import re
>>> inputstring = "hello hello hello 123"
>>> re.sub("\bhello\b", '<>', inputstring)
'hello hello hello 123'
>>> re.sub(r"\bhello\b", '<>', inputstring)
'<> <> <> 123'
Upvotes: 5