Reputation: 47
Using requests library to execute http GET that return JSON response i'm getting this error when response string contains unicode char:
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 20 (char 19)
Execute same http request with Postman the json output is:
{ "value": "VILLE D\u0019ANAUNIA" }
My python code is:
data = requests.get(uri, headers=HEADERS).text
json_data = json.loads(data)
Can I remove or replace all Unicode chars before executing conversion with json.loads(...)?
Upvotes: 1
Views: 3206
Reputation: 148870
It is likely to be caused by a RIGHT SINGLE QUOTATION MARK U+2019 (’
). For reasons I cannot guess, the high order byte has been dropped leaving you with a control character which should be escaped in a correct JSON string.
So the correct way would be to control what exactly the API returns. If id does return a '\u0019'
control character, you should contact the API owner because the problem should be there.
As a workaround, you can try to limit the problem for your processing by filtering out non ascii or control characters:
data = requests.get(uri, headers=HEADERS).text
data = ''.join((i for i in data if 0x20 <= ord(i) < 127)) # filter out unwanted chars
json_data = json.loads(data)
You should get {'value': 'VILLE DANAUNIA'}
Alternatively, you can replace all unwanted characters with spaces:
data = requests.get(uri, headers=HEADERS).text
data = ''.join((i if 0x20 <= ord(i) < 127 else ' ' for i in data))
json_data = json.loads(data)
You would get {'value': 'VILLE D ANAUNIA'}
Upvotes: 3
Reputation: 23815
The code below works on python 2.7:
import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }')
print(d)
The code below works on python 3.7:
import json
d = json.loads('{ "value": "VILLE D\u0019ANAUNIA" }', strict=False)
print(d)
Output:
{u'value': u'VILLE D\x19ANAUNIA'}
Another point is that requests get return the data as json:
r = requests.get('https://api.github.com/events')
r.json()
Upvotes: 2