Reputation: 103
I'm trying to redirect output of python script to a file. When output contains non-ascii characters it works on macOS and Linux, but not on Windows.
I've deduced the problem to a simple test. The following is what is shown in Windows command prompt window. The test is only one print call.
Microsoft Windows [Version 10.0.17134.472]
(c) 2018 Microsoft Corporation. All rights reserved.
D:\>set PY
PYTHONIOENCODING=utf-8
D:\>type pipetest.py
print('\u0422\u0435\u0441\u0442')
D:\>python pipetest.py
Тест
D:\>python pipetest.py > test.txt
D:\>type test.txt
Тест
D:\>type test.txt | iconv -f utf-8 -t utf-8
Тест
D:\>set PYTHONIOENCODING=
D:\>python pipetest.py
Тест
D:\>python pipetest.py > test.txt
Traceback (most recent call last):
File "pipetest.py", line 1, in <module>
print('\u0422\u0435\u0441\u0442')
File "C:\Python\Python37\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>
D:\>python -V
Python 3.7.2
As one can see setting PYTHONIOENCODING environment variable helps but I don't understand why it needed to be set. When output is terminal it works but if output is a file it fails. Why does cp1252 is used when stdout is not a console?
Maybe it is a bug and can be fixed in Windows version of python?
Upvotes: 7
Views: 1660
Reputation: 103
Python needs to write binary data to stdout
(not strings) hence requirement for encoding parameter.
Encoding (used to convert strings into bytes) is determined differently for each platform:
(Thanks to @Eric Leung for precise link)
The follow up question would be why Python on Windows uses current system locale for non-Unicode programs, and not what is set by chcp
command, but I will leave it for someone else.
Also it needs to be mentioned there's a checkbox titled "Beta: Use Unicode UTF-8..." in Region Settings on Windows 10 (to open - Win+R, type intl.cpl
). By checking the checkbox the above example works without error. But this checkbox is off by default and really deep in system settings.
Upvotes: 0
Reputation: 161
Based on Python documentation, Windows version use different character encoding on console device (utr-8) and non-character devices such as disk files and pipes (system locale). PYTHONIOENCODING can be used to override it.
https://docs.python.org/3/library/sys.html#sys.stdout
Another method is change the encoding directly in the program, I tried and it works fine.
sys.stdout.reconfigure(encoding='utf-8')
https://docs.python.org/3/library/io.html#io.TextIOWrapper.reconfigure
Upvotes: 6