Reputation: 50612
I assumed that any data being sent to my parameter strings would be utf-8, since that is what my whole site uses throughout. Lo-and-behold I was wrong.
For this example has the character ä
in utf-8 in the document (from the query string) but proceeds to send a B\xe4ule
(which is either ISO-8859-1 or windows 1252) when you click submit. It also fires off a ajax request which also fails from trying to decode the non-utf8 character.
An in django, my request.POST is really screwed up :
>>> print request.POST
<QueryDict: {u'alias': [u'eu.wowarmory.com/character-sheet.xml?r=Der Rat von Dalaran&cn=B\ufffde']}>
How can I just make all these headaches go away and work in utf8?
Upvotes: 3
Views: 5491
Reputation: 396
Getting an utf-8 string from the submitted form should just be a matter of encoding the unicode object:
next = request.POST['next'].encode('utf-8')
For the AJAX request, can you confirm that that request is also being sent as utf-8 and declared as utf-8 in the headers?
Upvotes: 0
Reputation: 12895
Since Django 1.0 all values you get from form submission are unicode objects, not bytestrings like in Django 0.96 and earlier. To get utf-8 from your values encode them with utf-8 codec:
request.POST['somefield'].encode('utf-8')
To get query parameters decoded properly, they have to be properly encoded first:
In [3]: urllib.quote('ä')
Out[3]: '%C3%A4'
I think your problem comes from bad encoding of query parameters.
Upvotes: 3
Reputation: 46683
According to Get non-UTF-8-form fields as UTF-8 in PHP?, you'll need to make sure the page itself is served up using UTF8 encoding.
Upvotes: 0
Reputation: 34313
Although it's AFAIK not specified anywhere, all browsers use the character encoding of the HTML page, on which the form is embedded as the encoding for submitting the form back to the server. So if you want the URL parameters to be UTF-8-encoded, you have to make sure that the HTML page, on which the form is embedded, is also UTF-8 encoded.
Upvotes: 0