fangsterr
fangsterr

Reputation: 3820

django-rest-framework: Unable to handle unicode input (invalid continuation byte)

I'm having trouble handling unicode characters in my input to my APIView endpoint in django-rest-framework.

I'm using the UnicodeJsonRenderer renderer class and the JSONParser class for input.

My input is as follows, using the web browsable api with a HTML form:

{
  "field": "hellö theré"
}

When I call request.DATA in my view, I get the following error message:

{
    "detail": "JSON parse error - 'utf8' codec can't decode byte 0xe2 in position 96: invalid continuation byte"
}

I debugged this pretty extensively, and I can tell that it crashes on line 60 in parsers.py:

data = stream.read().decode(encoding)

I'm not really sure how to resolve this issue. Though I suspect it has something to do with the encoding format, it doesn't feel right to me because I have other code in my codebase (not using the django-rest-framework library) that handles unicode input/output gracefully, as my settings.DEFAULT_CHARSET is utf-8.

Any help on this would be much appreciated.

UPDATE: I suspect it has something to do with the web browsable API sending non UTF-8 encoded character data, though the meta tag does set the charset to utf-8...

UPDATE 2: I've pasted below the request header of the POST request sent upon form submission with media type 'application/json'. I thought it was weird that the content-type didn't specify a charset. (I found all this using Chrome browser debug tools on the POST request being sent):

POST /api/stuff/ HTTP/1.1
Host: localhost:8000
Connection: keep-alive
Content-Length: 356
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Origin: http://localhost:8000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: http://localhost:8000/api/stuff/
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: cookie-info

Upvotes: 3

Views: 4425

Answers (2)

Raydel Miranda
Raydel Miranda

Reputation: 14360

Here there is some doc about JSON renders. I suspect the problem has to do with the fact the code you have posted

data = stream.read().decode(encoding)

is trying to decode an already decoded string, since you are using UnicodeJsonRenderer. If you visit the link, you will realize that UnicodeJsonRenderer has no charset. So, you can't decode it.

Try using another like JsonPRenderer or HTMLFormRenderer

Upvotes: 0

dotancohen
dotancohen

Reputation: 31471

The problem is that your input is not UTF-8! The hex code 0xe2 is a continuation byte in UTF-8, which would require another hex character to be legal. However the hex code 0xe2 is â in Windows-1252. Just ensure that you properly decode the byte stream using Windows-1252 (called cp1252 in Python):

text.decode('cp1252')

Upvotes: 3

Related Questions