Reputation: 23
I'm learning Python encoding stuffs, I met following situation which is wired for me and I want to know why.
First of all, this is my environment: OSX 10.10.3
Output of the command echo $LC_CTYPE, $LANG
is:
en_US.UTF-8, en_US.UTF-8
Output of python --version
is Python 2.7.6
Then I type python
to enter python shell:
>>> import sys; reload(sys); sys.setdefaultencoding('utf8')
<module 'sys' (built-in)>
>>> s16 = u'我'.encode('utf16')
>>> s16
'\xff\xfe\x11b'
>>> for c in s16:
... ord(c)
...
255
254
17
98
>>> s16_ = '\xff\xfe\x11\x62'
>>> s16_
'\xff\xfe\x11b'
So my question is: For the last line and the 4th line, why Python output '\xff\xfe\x11b'
instead of '\xff\xfe\x11\x62'
?
Upvotes: 0
Views: 101
Reputation: 122493
b
is a printable character, so repr()
will show the character itself, not the escaped form.
Reference: str.isprintable
:
Note that printable characters in this context are those which should not be escaped when
repr()
is invoked on a string.
Upvotes: 0
Reputation: 35089
When Python prints bytes (str
in Python 2), it prints the corresponding ASCII character for that byte if it is printable, and hex escapes it otherwise.
\x62
corresponds to ASCII 'b'. You can see this by just looking at that byte:
>>> '\x62'
'b'
Upvotes: 3