Unexpected behaviour of string.replace() in Python

Question

I discovered today that the function string.replace(str1, str2) in Python 3 as well as Python 2 does not behave in the way I instinctively thought it would:

$ python3
Python 3.4.2 (default, Oct  8 2014, 10:45:20) 
[GCC 4.9.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> str = ' not not not not Cannot not not not '.replace(' not ', ' NOT ')
>>> str
' NOT not NOT not Cannot NOT not NOT '

I understand why this happens: apparantly the replace function, once it finds a match, goes on on the first character after the previous found match which happens to be n in my case. Hence the second (and fourth...) not is never recognized, as the leading space is missing.

What is the standard way to replace strings to avoid the counter-intuitive behaviour above (so that all ␣not␣s are capitalized)?

I understand that I can split my string into takens, change the nots to NOTs and recombine it, but that is not what I am looking for. I suspect there is a proper replacement way available in Python.

dsh · Accepted Answer

import re

s = re.sub(r"\bnot\b", "NOT", s)

Use a regular expression to match word boundaries rather than trying to match the spaces in between words.

Unexpected behaviour of string.replace() in Python

Answers (1)

Related Questions