Reputation: 371
I'm working in WinXP 5.1.2600, writing a Python application involving Chinese pinyin, which has involved me in endless Unicode problems. Switching to Python 3.0 has solved many of them. But the print() function for console output is not Unicode-aware for some odd reason. Here's a teeny program.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
print('sys.stdout encoding is "' + sys.stdout.encoding + '"')
str1 = 'lüelā'
print(str1)
Output is (changing angle brackets to square brackets for readability):
sys.stdout encoding is "cp1252" Traceback (most recent call last): File "TestPrintEncoding.py", line 22, in [module] print(str1) File "C:\Python30\lib\io.py", line 1491, in write b = encoder.encode(s) File "C:\Python30\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0101' in position 4: character maps to [undefined]
Note that ü = '\xfc'
= 252
gives no problem since it's upper ASCII. But ā = '\u0101'
is beyond 8 bits.
Anyone have an idea how to change the encoding of sys.stdout
to 'utf-8'
? Bear in mind that Python 3.0 no longer uses the codecs
module, if I understand the documentation right.
(Note that the coding specified by the "coding:" line is the coding of the source code, not of the console output. But thank you for your thoughts!)
Upvotes: 18
Views: 21651
Reputation: 717
The problem of displaying Unicode charaters in Python in Windows is known. There is no official solution yet. The right thing to do is to use winapi function WriteConsoleW. It is nontrivial to build a working solution as there are other related issues. However, I have developed a package which tries to fix Python regarding this issue. See https://github.com/Drekin/win-unicode-console. You can also read there a deeper explanation of the problem. The package is also on pypi (https://pypi.python.org/pypi/win_unicode_console) and can be installed using pip.
Upvotes: 1
Reputation: 13467
Here's a dirty hack:
# works
import os
os.system("chcp 65001 &")
print("юникод")
However everything breaks it:
simple muting first line already breaks it:
# doesn't work
import os
os.system("chcp 65001 >nul &")
print("юникод")
checking for OS type breaks it:
# doesn't work
import os
if os.name == "nt":
os.system("chcp 65001 &")
print("юникод")
it doesn't even works under if block:
# doesn't work
import os
if os.name == "nt":
os.system("chcp 65001 &")
print("юникод")
But one can print with cmd's echo:
# works
import os
os.system("chcp 65001 & echo {0}".format("юникод"))
and here's a simple way to make this cross-platform:
# works
import os
def simple_cross_platrofm_print(obj):
if os.name == "nt":
os.system("chcp 65001 >nul & echo {0}".format(obj))
else:
print(obj)
simple_cross_platrofm_print("юникод")
but the window's echo
trailing empty line can't be suppressed.
Upvotes: 1
Reputation: 2669
You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8." I have written a page on my ordeal with this problem.
Upvotes: 12
Reputation: 29342
Check out the question and answer here, I think they have some valuable clues. Specifically, note the setdefaultencoding
in the sys
module, but also the fact that you probably shouldn't use it.
Upvotes: 2
Reputation: 3764
The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly.
See this thread for a full explanation: http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html
Upvotes: 15