orion91
orion91

Reputation: 47

Python JSON decoder error with unicode characters in request content

Using requests library to execute http GET that return JSON response i'm getting this error when response string contains unicode char:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20 (char 19)

Execute same http request with Postman the json output is:

{ "value": "VILLE D\u0019ANAUNIA" }

My python code is:

data = requests.get(uri, headers=HEADERS).text
json_data = json.loads(data)

Can I remove or replace all Unicode chars before executing conversion with json.loads(...)?

Upvotes: 1

Views: 3206

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148870

It is likely to be caused by a RIGHT SINGLE QUOTATION MARK U+2019 (). For reasons I cannot guess, the high order byte has been dropped leaving you with a control character which should be escaped in a correct JSON string.

So the correct way would be to control what exactly the API returns. If id does return a '\u0019' control character, you should contact the API owner because the problem should be there.

As a workaround, you can try to limit the problem for your processing by filtering out non ascii or control characters:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i for i in data if 0x20 <= ord(i) < 127))  # filter out unwanted chars
json_data = json.loads(data)

You should get {'value': 'VILLE DANAUNIA'}

Alternatively, you can replace all unwanted characters with spaces:

data = requests.get(uri, headers=HEADERS).text
data = ''.join((i if 0x20 <= ord(i) < 127 else ' ' for i in data))
json_data = json.loads(data)

You would get {'value': 'VILLE D ANAUNIA'}

Upvotes: 3

balderman
balderman

Reputation: 23815

The code below works on python 2.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }')
print(d)

The code below works on python 3.7:

import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }', strict=False)
print(d)

Output:

{u'value': u'VILLE D\x19ANAUNIA'}

Another point is that requests get return the data as json:

r = requests.get('https://api.github.com/events')
r.json()

Upvotes: 2

Related Questions