Reputation: 92449
If I execute the following Python 3.1 program, I see only � instead of the correct characters in my browser. The file itself is UTF-8 encoded and the same encoding is sent with the response.
from wsgiref.simple_server import make_server
page = "<html><body>äöü€ßÄÖÜ</body></html>"
def application(environ, start_response):
start_response("200 Ok", [("Content-Type", "text/html; charset=UTF-8")])
return page
httpd = make_server('', 8000, application)
print("Serving on port 8000...")
httpd.serve_forever()
"UTF-8" is set correctly in the response:
HTTP/1.0 200 Ok
Date: Mon, 09 Aug 2010 16:35:02 GMT
Server: WSGIServer/0.1 Python/3.1.1+
Content-Type: text/html; charset=UTF-8
What is wrong here?
Upvotes: 0
Views: 4791
Reputation: 536429
WSGI on Python 3 doesn't exist yet. The Web-SIG have still not reached any conclusion about how strings (bytes/unicode) are to be handled in Python 3.x.
wsgiref
is largely an automated 2to3 conversion; it still has problems even apart from the factor of what WSGI on 3.x will actually mean. Don't rely on it as a reference to how WSGI apps will work under Python 3.
That the situation is still like this coming into the 3.2 release cycle is embarrassing and depressing.
return page
Well, whilst WSGI for 3.x is still an unknown factor, one thing most agree on is that the response body of a WSGI app should generally be bytes and not unicode, since HTTP is a bytes-based protocol. Whether Unicode strings will be accepted—and if so what encoding they'll be converted with—remains to be seen, so avoid the issue and return bytes:
return [page.encode('utf-8')]
(The []
are needed because WSGI apps should return an iterable that's output and flushed an item at a time. If you pass a string on its own, that's used as an iterable and returned a character at a time, which is horrible for performance.)
Upvotes: 8
Reputation: 123662
Those characters are not UTF-8
; they are latin-1
. If you put those literals into your Python source code (which you shouldn't do), you need to declare the encoding of the file, by placing the following line at the top:
#-*- coding: latin-1 -*-
and serving in latin-1
:
start_response("200 Ok", [("Content-Type", "text/html; charset=latin-1")])
Assuming you meant to do everything in UTF-8, you need to look up the code points for those characters. You can then do
page = u"\x--\x--...\x--"
and serve that up as Unicode.
Note that you can verify this by changing the encoding of your browser; if you manually change it to latin-1
the characters will display fine.
Upvotes: 0