eran
eran

Reputation: 15136

printing hebrew in python works in eclipse but not shell

I have some code that converts a Unicode representation of hebrew text file into hebrew for display

for example:

f = open(sys.argv[1])
for line in f:
    print eval('u"' + line +'"')

This works fun when I run it in PyDev (eclipse), but when I run it from the command line, I get

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 9-10: ordinal not in range(256)

An example line from the input file is:

\u05d9\u05d5\u05dd

What is the problem? How can I solve this?

Upvotes: 0

Views: 1274

Answers (2)

Martijn Pieters
Martijn Pieters

Reputation: 1122242

Do not use eval(); instead use the unicode_escape codec to interpret that data:

for line in f:
    line = line.decode('unicode_escape')

The unicode_escape encoding interprets \uabcd character sequences the same way Python would when parsing a unicode literal in the source code:

>>> '\u05d9\u05d5\u05dd'.decode('unicode_escape')
u'\u05d9\u05d5\u05dd'

The exception you see is not caused by the eval() statement though; I suspect it is being caused by an attempt to print the result instead. Python will try to encode unicode values automatically and will detect what encoding the current terminal uses.

Your Eclipse output window uses a different encoding from your terminal; if the latter is configured to support Latin-1 then you'll see that exact exception, as Python tries to encode Hebrew codepoints to an encoding that doesn't support those:

>>> u'\u05d9\u05d5\u05dd'.encode('latin1')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256)

The solution is to reconfigure your terminal (UTF-8 would be a good choice), or to not print unicode values with codepoints that cannot be encoded to Latin-1.

If you are redirecting output from Python to a file, then Python cannot determine the output encoding automatically. In that case you can use the PYTHONIOENCODING environment variable to tell Python what encoding to use for standard I/O:

PYTHONIOENCODING=utf-8 python yourscript.py > outputfile.txt

Upvotes: 4

eran
eran

Reputation: 15136

Thank you, this solved my problem.

line.decode('unicode_escape')

did the trick.

Followup - This now works, but if I try to send the output to a file:

python myScript.py > textfile.txt

The file itself has the error:

'ascii' codec can't encode characters in position 42-44: ordinal not in range(128)

Upvotes: 0

Related Questions