Reputation: 1057
I'm trying to read in Python3 a text file specifying encoding cp1252 which has unmapped characters (for instance byte 0x8d).
with open(inputfilename, mode='r', encoding='cp1252') as inputfile:
print(inputfile.readlines())
I obviously get the following exception:
Traceback (most recent call last):
File "test.py", line 9, in <module>
print(inputfile.readlines())
File "/usr/lib/python3.6/encodings/cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 14: character maps to <undefined>
I'd like to understand why, when reading the same file with encoding latin-1, I don't get the same exception and the byte 0x8d is represented as hex string:
$ python3 test.py
['This is a test\x8d file\n']
As far as i know byte 0x8d does not have a match on both encodings (latin-1 and cp1252). What am I missing? Why Python3 behaviour is different?
Upvotes: 2
Views: 1144
Reputation: 2755
from the docs: The simplest text encoding (called 'latin-1' or 'iso-8859-1') maps the code points 0–255 to the bytes 0x0–0xff
https://docs.python.org/3/library/codecs.html
Upvotes: 0