Reputation: 1891
My problem is, I can output Unicode charaters into my terminal but not into files. Demonstration:
user@ubuntu:~$ python -c 'print u"\u5000"'
倀
user@ubuntu:~$ python -c 'print u"\u5000"' >a.out
Traceback (most recent call last):
File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u5000' in position 0: ordinal not in range(128)
Output of "locale":
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
Upvotes: 4
Views: 792
Reputation: 1891
The problem was actually with Python. A solution was setting PYTHONIOENCODING=utf_8.
Upvotes: 1
Reputation: 531345
Because your terminal is set to use UTF-8, Python knows how to encode a Unicode character when writing directly to the terminal. When writing to the file, however, there is no encoding specified, so Python defaults to ASCII. To write to the file, you need to explicitly specify a byte encoding.
python -c 'print u"\u5000".encode("UTF-8")' >a.out
Upvotes: 4