Mike
Mike

Reputation: 1225

Python 3 Default Encoding cp1252

I recently ran into some problems decoding a handle (with errors mapping 0x81, 0x8D) from the Biopython module with an anaconda 4.1.1 python 3.5.2 installation on a sony vaio windows 10 system

After some research, it seems that possibly the problem may be that the default decoding codec is cp1252. I ran the code below and found that indeed the default codec is set to cp1252.

However, several posts suggest that python 3 should have set the default codec to utf8. Is that correct? If so, why is mine cp1252 and how can I solve this? import locale os_encoding = locale.getpreferredencoding()

Upvotes: 12

Views: 16732

Answers (1)

Josh Lee
Josh Lee

Reputation: 177510

According to What’s New In Python 3.0,

There is a platform-dependent default encoding […] In many cases, but not all, the system default is UTF-8; you should never count on this default.

and

PEP 3120: The default source encoding is now UTF-8.

In other words, Python opens source files as UTF-8 by default, but any interaction with the filesystem will depend on the environment. It's strongly recommended to use open(filename, encoding='utf-8') to read a file.

Another change is that b'bytes'.decode() and 'str'.encode() with no argument use utf-8 instead of ascii.

Python 3.6 changes some more defaults:

PEP 529: Change Windows filesystem encoding to UTF-8

PEP 528: Change Windows console encoding to UTF-8

But the default encoding for open() is still whatever Python manages to infer from the environment.

It appears that 3.7 will add an (opt-in!) mode where the environmental locale encoding is ignored, and everything is all UTF-8 all the time (except for specific cases where Windows uses UTF-16, I suppose). See PEP 0540 and corresponding Issue 29240.

Upvotes: 12

Related Questions