Reputation: 993095
Setting the default output encoding in Python 2 is a well-known idiom:
sys.stdout = codecs.getwriter("utf-8")(sys.stdout)
This wraps the sys.stdout
object in a codec writer that encodes output in UTF-8.
However, this technique does not work in Python 3 because sys.stdout.write()
expects a str
, but the result of encoding is bytes
, and an error occurs when codecs
tries to write the encoded bytes to the original sys.stdout
.
What is the correct way to do this in Python 3?
Upvotes: 89
Views: 78302
Reputation: 172249
sys.stdout
is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.
Where this would fail in Python 2:
>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)
However, it works just dandy in Python 3:
>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7
Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.
Upvotes: 8
Reputation: 229593
Since Python 3.7 you can change the encoding of standard streams with reconfigure()
:
sys.stdout.reconfigure(encoding='utf-8')
You can also modify how encoding errors are handled by adding an errors
parameter.
Upvotes: 72
Reputation: 10886
Other answers seem to recommend using codecs
, but open
works for me:
import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())
This works even when I run it with PYTHONIOENCODING="ascii"
.
Upvotes: 38
Reputation: 48058
I found this thread while searching for solutions to the same error,
An alternative solution to those already suggested is to set the PYTHONIOENCODING
environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout
after Python is initialized:
PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py
With the advantage of not having to go and edit the Python code.
Upvotes: 45
Reputation: 57870
Using detach()
causes the interpreter to print a warning when it tries to close stdout just before it exits:
Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached
Instead, this worked fine for me:
default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')
(And, of course, writing to default_out
instead of stdout.)
Upvotes: 13
Reputation: 536379
Setting the default output encoding in Python 2 is a well-known idiom
Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.
It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.
CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write
to send bytes directly. Encoding page content to match its charset
parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print
is no good for CGI any more.
(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)
Upvotes: 18
Reputation: 993095
Python 3.1 added io.TextIOBase.detach()
, with a note in the documentation for sys.stdout
:
The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to
stdout
, usesys.stdout.buffer.write(b'abc')
. Usingio.TextIOBase.detach()
streams can be made binary by default. This function setsstdin
andstdout
to binary:def make_streams_binary(): sys.stdin = sys.stdin.detach() sys.stdout = sys.stdout.detach()
Therefore, the corresponding idiom for Python 3.1 and later is:
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
Upvotes: 54