Reputation: 1225
I recently ran into some problems decoding a handle (with errors mapping 0x81, 0x8D) from the Biopython module with an anaconda 4.1.1 python 3.5.2 installation on a sony vaio windows 10 system
After some research, it seems that possibly the problem may be that the default decoding codec is cp1252. I ran the code below and found that indeed the default codec is set to cp1252.
However, several posts suggest that python 3 should have set the default codec to utf8. Is that correct? If so, why is mine cp1252 and how can I solve this?
import locale
os_encoding = locale.getpreferredencoding()
Upvotes: 12
Views: 16732
Reputation: 177510
According to What’s New In Python 3.0,
There is a platform-dependent default encoding […] In many cases, but not all, the system default is UTF-8; you should never count on this default.
and
PEP 3120: The default source encoding is now UTF-8.
In other words, Python opens source files as UTF-8 by default, but any interaction with the filesystem will depend on the environment. It's strongly recommended to use open(filename, encoding='utf-8')
to read a file.
Another change is that b'bytes'.decode()
and 'str'.encode()
with no argument use utf-8 instead of ascii.
Python 3.6 changes some more defaults:
PEP 529: Change Windows filesystem encoding to UTF-8
PEP 528: Change Windows console encoding to UTF-8
But the default encoding for open()
is still whatever Python manages to infer from the environment.
It appears that 3.7 will add an (opt-in!) mode where the environmental locale encoding is ignored, and everything is all UTF-8 all the time (except for specific cases where Windows uses UTF-16, I suppose). See PEP 0540 and corresponding Issue 29240.
Upvotes: 12