Python Unicode String Replacement: u, r or nothing

Question

HI take a look at the following code snippet on Python 2.7:

# -*- coding: utf-8 -*-
content = u"和製英語とかカタカナ英語、
ジャパングリッシュなどと呼ばれる英語っぽいけど実は英語じゃない言葉があります。"
#print content
print content.replace(u"",u"
").replace(u"
",u"
").replace(u"
",u"")
print content.replace("","
").replace("
","
").replace("
","")
print content.replace(r"",r"
").replace(r"
",r"
").replace(r"
",r"")

The result is the same:

和製英語とかカタカナ英語、ジャパングリッシュなどと呼ばれる英語っぽいけど実は英語じゃない言葉があります。

My questions is: is there any difference between the three "replace" statements? (u, r or none?) Which one is the best?

Mark Tolonen · Accepted Answer

The first one is best. The second two options have to implicitly convert their byte strings to Unicode to do the replacement on the Unicode content string. Otherwise, with the strings provided, the result happens to be the same. If the replacement strings contained non-ASCII characters, there would be a UnicodeDecodeError on the second two because the default codec for the conversion is ascii on Python 2.X.

Note the speed difference as well:

C:\>python -m timeit -s "content=u'blah
blah
'" "content.replace(u'',u'
').replace(u'
',u'
').replace(u'
',u'')"
1000000 loops, best of 3: 1.09 usec per loop

C:\>python -m timeit -s "content=u'blah
blah
'" "content.replace('','
').replace('
','
').replace('
','')"
1000000 loops, best of 3: 1.76 usec per loop

C:\>python -m timeit -s "content=u'blah
blah
'" "content.replace(r'',r'
').replace(r'
',r'
').replace(r'
',r'')"
1000000 loops, best of 3: 1.75 usec per loop

Python Unicode String Replacement: u, r or nothing

Answers (2)

Related Questions