Reputation: 406
I have a string x
defined as below
x = b'LF \xa9 2020 by S&P Global Inc.,200523\n'
In iPython2
In [10]: x
Out[10]: 'LF \xa9 2020 by S&P Global Inc.,200523\n'
In [11]: print(x)
LF � 2020 by S&P Global Inc.,200523
In [12]: x.decode('ISO-8859-1')
Out[12]: u'LF \xa9 2020 by S&P Global Inc.,200523\n'
In [13]: print(x.decode('ISO-8859-1'))
LF © 2020 by S&P Global Inc.,200523
Question 1: why is the output for x and print(x) different? The same between x.decode('ISO-8859-1') and print(x.decode('ISO-8859-1')).
In iPython3
In [3]: x
Out[3]: b'LF \xa9 2020 by S&P Global Inc.,200523\n'
In [4]: print(x)
b'LF \xa9 2020 by S&P Global Inc.,200523\n'
In [5]: x.decode('ISO-8859-1')
Out[5]: 'LF © 2020 by S&P Global Inc.,200523\n'
In [7]: print(x.decode('ISO-8859-1'))
LF © 2020 by S&P Global Inc.,200523
Question 2: As you can see, in Python3, the output for x and print(x) are the same. So are x.decode('ISO-8859-1') and print(x.decode('ISO-8859-1')). In Python2, it is not the case. Why is this distinction between Python2 and Python3?
Question 3: why the output of print(x) in Python 2 and 3 are different, the output of x is the same?
Question 4: why the output of x.decode('ISO-8859-1') in Python 2 and 3 are different, but print are the same?
Upvotes: 1
Views: 79
Reputation: 40878
Question 1: why is the output for x and print(x) different?
Just typing x
into a REPL can be thought of as:
>>> print repr(x)
'LF \xa9 2020 by S&P Global Inc.,200523\n'
Question 2: As you can see, in Python3, the output for x and print(x) are the same. So are x.decode('ISO-8859-1') and print(x.decode('ISO-8859-1')). In Python2, it is not the case. Why is this distinction between Python2 and Python3?
Because x
is a bytes
object in Python 3, where print()
will not attempt to decode the bytestring. Python 3 bytes
representation display binary values over 127 using the corresponding escape sequence.
Question 3: why the output of print(x) in Python 2 and 3 are different, the output of x is the same?
Because repr(x)
gives the same thing on Python 2 and 3.
Question 4: why the output of x.decode('ISO-8859-1') in Python 2 and 3 are different, but print are the same?
Because x.decode('ISO-8859-1')
in Python 2 produces a unicode
object in Python 2 and a str
object in Python 3, whose __repr__()
differ in how they display non-ASCII.
If you want a more thorough read on all of this, check out Unicode & Character Encodings in Python: A Painless Guide. (Disclosure: I wrote it.)
Upvotes: 1