Paul Tarjan
Paul Tarjan

Reputation: 50612

How to deal with query parameter's encoding?

I assumed that any data being sent to my parameter strings would be utf-8, since that is what my whole site uses throughout. Lo-and-behold I was wrong.

For this example has the character ä in utf-8 in the document (from the query string) but proceeds to send a B\xe4ule (which is either ISO-8859-1 or windows 1252) when you click submit. It also fires off a ajax request which also fails from trying to decode the non-utf8 character.

An in django, my request.POST is really screwed up :

>>> print request.POST
<QueryDict: {u'alias': [u'eu.wowarmory.com/character-sheet.xml?r=Der Rat von Dalaran&cn=B\ufffde']}>

How can I just make all these headaches go away and work in utf8?

Upvotes: 3

Views: 5491

Answers (5)

Nuno Maltez
Nuno Maltez

Reputation: 396

Getting an utf-8 string from the submitted form should just be a matter of encoding the unicode object:

next = request.POST['next'].encode('utf-8')

For the AJAX request, can you confirm that that request is also being sent as utf-8 and declared as utf-8 in the headers?

Upvotes: 0

six8
six8

Reputation: 2990

You should also add accept-charset="UTF-8" to the <form/> tag.

Upvotes: 1

zgoda
zgoda

Reputation: 12895

Since Django 1.0 all values you get from form submission are unicode objects, not bytestrings like in Django 0.96 and earlier. To get utf-8 from your values encode them with utf-8 codec:

request.POST['somefield'].encode('utf-8')

To get query parameters decoded properly, they have to be properly encoded first:

In [3]: urllib.quote('ä')
Out[3]: '%C3%A4'

I think your problem comes from bad encoding of query parameters.

Upvotes: 3

Justin Grant
Justin Grant

Reputation: 46683

According to Get non-UTF-8-form fields as UTF-8 in PHP?, you'll need to make sure the page itself is served up using UTF8 encoding.

Upvotes: 0

jarnbjo
jarnbjo

Reputation: 34313

Although it's AFAIK not specified anywhere, all browsers use the character encoding of the HTML page, on which the form is embedded as the encoding for submitting the form back to the server. So if you want the URL parameters to be UTF-8-encoded, you have to make sure that the HTML page, on which the form is embedded, is also UTF-8 encoded.

Upvotes: 0

Related Questions