laurt
laurt

Reputation: 1891

Writing Unicode to file with Python

My problem is, I can output Unicode charaters into my terminal but not into files. Demonstration:

user@ubuntu:~$ python -c 'print u"\u5000"'
倀
user@ubuntu:~$ python -c 'print u"\u5000"' >a.out
Traceback (most recent call last):
  File "<string>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u5000' in position 0: ordinal not in range(128)

Output of "locale":

LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

Upvotes: 4

Views: 792

Answers (2)

laurt
laurt

Reputation: 1891

The problem was actually with Python. A solution was setting PYTHONIOENCODING=utf_8.

Upvotes: 1

chepner
chepner

Reputation: 531345

Because your terminal is set to use UTF-8, Python knows how to encode a Unicode character when writing directly to the terminal. When writing to the file, however, there is no encoding specified, so Python defaults to ASCII. To write to the file, you need to explicitly specify a byte encoding.

python -c 'print u"\u5000".encode("UTF-8")' >a.out

Upvotes: 4

Related Questions