Pzhang
Pzhang

Reputation: 361

How Python request method set its 'json' parameter with a 'utf-8' encoding value

I have a simple 'Python2' code to patch to my Bigip server with a Chinese character description.

The code sample is just like below:

    # -*- coding: utf-8 -*-
    
    import requests
    import json
    import base64
    
    url = "https://10.13.7.17/mgmt/tm/ltm/virtual/~Project_6fd06a50b7824ae48386565786e94b38~test"
    
    # the unicode is u'bu\u9000'
    test2 = u"bu退"
    
    # encode the unicode character to 'utf-8'
    # 'utf-8' code is {'description': 'bu\xe9\x80\x80'}
    data = {"description": test2.encode('utf-8')}
    payload = data
    
    headers = {
      'Authorization': 'Basic YWRamdflk6YWRtaW5Aasdlfjkla',
    }
    
    # use json parameter rather than data.
    response = requests.request("PATCH", url, headers=headers, json=payload, verify=False)

    # if change the below line to this
    # response = requests.request("PATCH", url, headers=headers, data=test2.encode('utf-8'), verify=False)
    # the HTTP Patch will succeed. but the underlayer code uses the `json` as the parameter, which I cannot simply modify.
        
    print(response.text)

The server will return an error like below:

{"code":400,"message":"double quotes are not balanced","errorStack":[],"apiError":26214401}

I dig into the Request model, I find the Request model set the 'utf-8' code back to the 'unicode' .

The Request model prepares its body. initially, the json parameter is {'description': 'bu\xe9\x80\x80'}, after complexjson.dumps(json) process, the body is set the character back to the unicode format '{"description": "bu\\u9000"}'.

Then the code will not process 'utf-8' encode again, since if not isinstance(body, bytes) is False.

> /usr/lib/python2.7/site-packages/requests/models.py(459)prepare_body()
-> content_type = 'application/json'
(Pdb) l
454
455             if not data and json is not None:
456                 # urllib3 requires a bytes-like body. Python 2's json.dumps
457                 # provides this natively, but Python 3 gives a Unicode string.
458     
459  ->             content_type = 'application/json'
460                 body = complexjson.dumps(json)
461                 if not isinstance(body, bytes):
462                     body = body.encode('utf-8')
463
464             is_stream = all([
(Pdb) p json
{'description': 'bu\xe9\x80\x80'}
(Pdb) n
> /usr/lib/python2.7/site-packages/requests/models.py(460)prepare_body()
-> body = complexjson.dumps(json)
(Pdb) n
> /usr/lib/python2.7/site-packages/requests/models.py(461)prepare_body()
-> if not isinstance(body, bytes):
(Pdb) p body
'{"description": "bu\\u9000"}'
(Pdb) n
> /usr/lib/python2.7/site-packages/requests/models.py(464)prepare_body()
-> is_stream = all([

refer to this github comment, the dumps(json) "By preset, the Python JSON library will automated entweichen select non-ASCII unicode code points".

On python2

json.dumps(data)
'{"description": "bu\\u9000"}'

json.dumps(data, ensure_ascii=False)
u'{"description": "bu\u9000"}'

json.dumps(data).encode('utf-8')
'{"description": "bu\\u9000"}'

json.dumps(data, ensure_ascii=False).encode('utf-8')
'{"description": "bu\xe9\x80\x80"}'

Is there any way to use the json parameter of the 'requests.request' method to post/patch exotic characters? or is it the server-side problem that cannot decode unicode as 'utf-8'?

anyone can help? thanks.

Upvotes: 0

Views: 233

Answers (1)

Touten
Touten

Reputation: 308

The value of your data has a binary string. Try this: test2.encode('utf-8').decode() to convert it to a string.

Upvotes: 0

Related Questions