Reputation: 4330
Is there a way (on python2 and python3) to configure tmp_stdout
to use a different encoding?
(I know that on python3 there is the encoding parameter but this is not possible on python2)
import tempfile
import sys
original_stdout = sys.stdout
with tempfile.TemporaryFile(mode="w+") as tmp_stdout:
# patch sys.stdout
sys.stdout = tmp_stdout
print("📙")
tmp_stdout.seek(0)
actual_output = tmp_stdout.read()
# restore stdout
sys.stdout = original_stdout
Also why is the default encoding on windows cp1252
even when my Command Prompt usese cp850
.
This is the error you get when you run it on windows with python3.6
Traceback (most recent call last):
File "Desktop\test.py", line 11, in <module>
print("📙")
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\tempfile.py", line 483, in func_wrapper
return func(*args, **kwargs)
File "C:\Users\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f4d9' in position 0: character maps to <undefined>
Upvotes: 0
Views: 1091
Reputation: 34270
The Windows console defaults to the system OEM codepage (e.g. 850 in Western Europe), which supports legacy DOS programs and batch scripts, but serves no real purpose nowadays. Python 3.6+ uses the console's Unicode API instead. Internally this is UTF-16LE, but at the buffer/raw layer it presents as UTF-8 for cross-platform compatibility. To get similar support in Python 2, install and enable win_unicode_console.
For non-console files, the default encoding in Python 3 is the system ANSI codepage (e.g. 1252 in Western Europe). This is the classic default for many text editors in Windows, such as notepad. To get the full range of Unicode, override the encoding using the argument encoding='utf-8'
. To support this in both Python 2 and 3, you can wrap the file descriptor (i.e. fileno()
) using the io module, which was backported when Python 3 was released (2.6+). For example:
import sys
import tempfile
with tempfile.TemporaryFile(mode='w+b') as tmp:
tmp_stdout = io.open(tmp.fileno(), mode='w+', encoding='utf-8', closefd=False)
sys.stdout, original_stdout = tmp_stdout, sys.stdout
try:
print("📙")
finally:
sys.stdout = original_stdout
tmp_stdout.seek(0)
actual_output = tmp_stdout.read()
Note that the temp file is opened with the mode "w+b", which avoids the C runtime's low-level text mode in Python 2 on Windows, which we don't want because it handles the character 0x1A (i.e. Ctrl+Z) as the end-of-file marker (a legacy from DOS and CP/M) and does newline translation (e.g. LF -> CRLF). The io module's TextIOWrapper
already implements newline translation. Note also that the io.open
call uses closefd=False
since tmp
is already closed automatically in the with
statement.
Upvotes: 3