trinth
trinth

Reputation: 6047

Why doesn't setting the locale fix this UnicodeError?

I have the following Python script:

# -*- coding: utf-8 -*-
import sys, locale
locale.setlocale(locale.LC_ALL, 'en_US.utf8')
print '肥皂' # This works
print u'肥皂'

When running the script I get:

肥皂
Traceback (most recent call last):
  File "../pycli/samples/x.py", line 5, in <module>
    print u'肥皂'
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: ordinal not in range(256)

However, when I explicitly set the LC_ALL environment variable in the shell then it works

export LC_ALL=en_US.utf8

So I'm wondering why doesn't the setlocale() have the same effect?

Upvotes: 6

Views: 871

Answers (2)

Thomas Orozco
Thomas Orozco

Reputation: 55197

Unicode is like a conceptual idea of text that is only present inside your program.

It has the advantage that it can support any character, but the disadvantage that it can't be output as-is and must therefore be encoded to some encoding that can be displayed.

So, you want some input, it will be encoded and you will have to decode it, and if you want to output unicode, you have to encode it.

If you dont do it, python will try and do it for you (using ASCII, or what might be found in your env, as in your case), but you shouldnt rely on this, because python might get it wrong (as in your case).

Quite funnily, you can notice that in your case your terminal supports utf8, but that python didn't realize it could be using utf8.

Thats why you should always encode output and decode input (preferably using utf8 when possible !)

You can achieve this using the unicode encode method and the string decode method, giving them the encoding as argument.

Upvotes: 1

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 798536

The value is only used to specify the default charset for output on interpreter startup. In other words, you're too late once the script is up and running.

Upvotes: 2

Related Questions