GokuShanth
GokuShanth

Reputation: 203

Regular Expressions working differently in Python and Ruby

Say, I have a simple string

str = "hello hello hello 123"

In Python, I want to replace all words called "hello" with "<>", I use

re.sub("\bhello\b",'<>',str)

In Ruby 1.8.7 , I use

str.gsub!(/\bhello\b/,'<>')

However, the Ruby Interpreter works as expected changing all WORDS called hello properly. But, Python doesn't - it doesn't even recognize a single word called hello.

My questions are:

Upvotes: 3

Views: 212

Answers (2)

Bhargav Rao
Bhargav Rao

Reputation: 52151

You have to make it a raw string as python interprets \b and <> differently

>>> s = "hello hello hello 123"
>>> import re
>>> re.sub(r"\bhello\b",r'<>',s)
'<> <> <> 123'*

Note - Never name your string as str as it over-rides the built in functionality.

Upvotes: 4

Martijn Pieters
Martijn Pieters

Reputation: 1123520

Python strings interpret backslashes as escape codes; \b is a backspace character. Either double the backslash or use a raw string literal:

re.sub("\\bhello\\b", '<>', inputstring)

or

re.sub(r"\bhello\b", '<>', inputstring)

Compare:

>>> print "\bhello\b"
hello
>>> print r"\bhello\b"
\bhello\b
>>> len("\bhello\b"), len(r"\bhello\b")
(7, 9)

See The Backslash Plague section of the Python regex HOWTO:

As stated earlier, regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals.

[...]

The solution is to use Python’s raw string notation for regular expressions; backslashes are not handled in any special way in a string literal prefixed with 'r', so r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation.

Demo:

>>> import re
>>> inputstring = "hello hello hello 123"
>>> re.sub("\bhello\b", '<>', inputstring)
'hello hello hello 123'
>>> re.sub(r"\bhello\b", '<>', inputstring)
'<> <> <> 123'

Upvotes: 5

Related Questions