Python Raw-Unicode-Escape encoding

Question

I am reading documentation of python 2.7, I just don't understand Raw-Unicode-Escape encoding. Original documentation is below:

For experts, there is also a raw mode just like the one for normal strings. You have to prefix the opening quote with ‘ur’ to have Python use the Raw-Unicode-Escape encoding. It will only apply the above \uXXXX conversion if there is an uneven number of backslashes in front of the small ‘u’.

And I wonder why the required number of backslashes is uneven. Is it just a rule or due to anything else?

cco · Accepted Answer

\uXXXX escapes are handled specially in raw strings, as the text you quoted describes. ur'\\' is a string containing four backslashes, while ur'\\u0020\' is four backslashes and a space. If I had to guess why there have to be an uneven number of backslashes for the \u to be recognized, I'd guess that it was because the non-raw string parser works like that too (I haven't looked at the source to be sure).
The question of why probably comes down to "because that's the way it was defined" for python 2. Python 3 doesn't do that anymore - r'\\u0020\' is the same as '\\\u0020\\'.

Python Raw-Unicode-Escape encoding

Answers (1)

Related Questions