Chintan Shah
Chintan Shah

Reputation: 955

Python uses 'ascii' codec in decoding where it should use 'UTF-8'

I have a piece of code:

with open('filename.txt','r') as textfile:
    kwList = [x.strip('\n') for x in textfile.readlines()]

I get a: UnicodeDecodeError : 'ascii' codec can't decode byte 0xc4 in position 5595: ordinal not in range(128) on line 2

The problem is that according the python docs : https://docs.python.org/3/library/functions.html#open

Python3 uses locale.getpreferredencoding(False) to get the default encoding to use when there is no encoding specified in the open method.

When I run locale.getpreferredencoding(False), I get 'UTF-8'.

Why do I get 'ascii' codec failed in the UnicodeDecodeError when Python should use 'utf-8' to do this?

Upvotes: 2

Views: 1884

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1121256

The locale is taken from the context; on POSIX systems, that means the environment variables, see the POSIX locale documentation. You'll need to reproduce the exact context of your production environment if you want to test what encoding Python will decide on (e.g. copy the environment variables used by the production environment too).

You are probably running your program as a subprocess of something that only sets (or inherits) the effective user, but does not copy the environment for that user. Either an explicit locale has been set by that parent process or, if none is set, the default C locale is used. The default encoding for that locale is ASCII; some systems will report this by the name ANSI_X3.4-1968:

$ LANG=C python -c 'import locale; print(locale.getpreferredencoding(False))'
ANSI_X3.4-1968

If, for example, your production code is run from cron, then the environment variables are not set when you set a specific user. Set LC_ALL environment variable explicitly at the top of your crontab:

LC_ALL=en.UTF-8

if your cron implementation supports setting variables this way, or set it on the command line you are going to run:

* * * * *    LC_ALL=nb_NO.UTF-8 /path/to/your/program

See Where can I set environment variables that crontab will use?

Upvotes: 2

Related Questions