Reputation: 226
System: python 3.4.2 on linux.
I'm woring on a django application (irrelevant), and I encountered a problem that it throws
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
when print is called (!). After quite a bit of digging, I discovered I should check
>>> sys.getdefaultencoding()
'utf-8'
but it was as expected, utf8. I noticed also that os.path.exists
throws the same exception when used with a unicode string. So I checked
>>> sys.getfilesystemencoding()
'ascii'
When I used LANG=en_US.UTF-8
the issue disappeared. I understand now why os.path.exists
had problems with that. But I have absolutely no clue why print
statement is affected by the filesystem setting. Is there a third setting I'm missing? Or does it just assume LANG
environment is to be trusted for everything?
Also... I don't get the reasoning here. LANG
does not tell what encoding is supported by the filenames. It has nothing to do with that. It's set separately for the current environment, not for the filesystem. Why is python using this setting for filesystem filenames? It makes applications very fragile, as all the file operations just break when run in an environment where LANG
is not set or set to C
(not uncommon, especially when a web-app is run as root or a new user created specifically for the daemon).
Test code (no actual unicode input needed to avoid terminal encoding pitfalls):
x=b'\xc4\x8c\xc5\xbd'
y=x.decode('utf-8')
print(y)
Question:
LANG
setting?print
affected?Upvotes: 1
Views: 598
Reputation: 1124110
LANG
is used to determine your locale; if you don't set specific LC_
variables the LANG
variable is used as the default.
The filesystem encoding is determined by the LC_CTYPE
variable, but if you haven't set that variable specifically, the LANG
environment variable is used instead.
Printing uses sys.stdout
, a textfile configured with the codec your terminal uses. Your terminal settings is also locale specific; your LANG
variable should really reflect what locale your terminal is set to. If that is UTF-8, you need to make sure your LANG
variable reflects that. sys.stdout
uses locale.getpreferredencoding(False)
(like all text streams opened without an explicit encoding set) and on POSIX systems that'll use LC_CTYPE
too.
Upvotes: 1