Reputation: 3820
I'm having trouble handling unicode characters in my input to my APIView endpoint in django-rest-framework.
I'm using the UnicodeJsonRenderer renderer class and the JSONParser class for input.
My input is as follows, using the web browsable api with a HTML form:
{
"field": "hellö theré"
}
When I call request.DATA in my view, I get the following error message:
{
"detail": "JSON parse error - 'utf8' codec can't decode byte 0xe2 in position 96: invalid continuation byte"
}
I debugged this pretty extensively, and I can tell that it crashes on line 60 in parsers.py
:
data = stream.read().decode(encoding)
I'm not really sure how to resolve this issue. Though I suspect it has something to do with the encoding format, it doesn't feel right to me because I have other code in my codebase (not using the django-rest-framework library) that handles unicode input/output gracefully, as my settings.DEFAULT_CHARSET is utf-8.
Any help on this would be much appreciated.
UPDATE: I suspect it has something to do with the web browsable API sending non UTF-8 encoded character data, though the meta tag does set the charset to utf-8...
UPDATE 2: I've pasted below the request header of the POST request sent upon form submission with media type 'application/json'. I thought it was weird that the content-type didn't specify a charset. (I found all this using Chrome browser debug tools on the POST request being sent):
POST /api/stuff/ HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Content-Length: 356
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Origin: http://localhost:8000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: http://localhost:8000/api/stuff/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: cookie-info
Upvotes: 3
Views: 4425
Reputation: 14360
Here there is some doc about JSON renders. I suspect the problem has to do with the fact the code you have posted
data = stream.read().decode(encoding)
is trying to decode an already decoded string, since you are using UnicodeJsonRenderer. If you visit the link, you will realize that UnicodeJsonRenderer has no charset. So, you can't decode it.
Try using another like JsonPRenderer or HTMLFormRenderer
Upvotes: 0
Reputation: 31471
The problem is that your input is not UTF-8! The hex code 0xe2
is a continuation byte in UTF-8, which would require another hex character to be legal. However the hex code 0xe2
is â
in Windows-1252. Just ensure that you properly decode the byte stream using Windows-1252 (called cp1252
in Python):
text.decode('cp1252')
Upvotes: 3