Jcyrss
Jcyrss

Reputation: 1800

what is the default encoding when python Requests post data is string type?

with fhe following code

payload = '''
 工作报告 
 总体情况:良好 
'''
r = requests.post("http://httpbin.org/post", data=payload)

what is the default encoding when Requests post data is string type? UTF8 or unicode-escape?

if I like to specify a encoding type, do I have to encode it myself and pass a bytes object to parameter 'data'?

Upvotes: 9

Views: 23348

Answers (3)

neves
neves

Reputation: 39353

As per latest JSON spec (RFC-8259) when using external services you must encode your JSON payloads as UTF-8. Here is a quick solution:

r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'))

requests uses httplib which defaults to latin-1 encoding. Byte arrays aren't automatically encoded so it is always better encode your text data yourself and use a bytearray.

I'd also recommend to set the charset using the headers parameter:

r = requests.post("http://httpbin.org/post", data=payload.encode('utf-8'),
                  headers={'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8'})

Upvotes: 11

snakecharmerb
snakecharmerb

Reputation: 55844

Requests uses* the standard library's http.client.HTTPConnection.request to send requests. This method will encode str data as latin-1 but will not encode bytes.

If you provide encoded input you should add a content-type header specifying the encoding used; conversely, if you provide a content-type header you should ensure that the encoding of the body matches that specified.

From the docs for HTTPConnection.request:

If body is specified, the specified data is sent after the headers are finished. It may be a str, a bytes-like object, an open file object, or an iterable of bytes. If body is a string, it is encoded as ISO-8859-1, the default for HTTP. If it is a bytes-like object, the bytes are sent as is. If it is a file object, the contents of the file is sent; this file object should support at least the read() method. If the file object is an instance of io.TextIOBase, the data returned by the read() method will be encoded as ISO-8859-1, otherwise the data returned by read() is sent as is. If body is an iterable, the elements of the iterable are sent as is until the iterable is exhausted.

* httplib was renamed to http.client in Python3

Upvotes: 0

tripleee
tripleee

Reputation: 189679

If you actually try your example you will find:

$ python
Python 3.7.2 (default, Jan 29 2019, 13:41:02) 
[Clang 10.0.0 (clang-1000.10.44.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>> payload = '''
...  工作报告 
...  总体情况:良好 
... '''
>>> r = requests.post("http://127.0.0.1:8888/post", data=payload)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 116, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/tmp/venv/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/tmp/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/tmp/venv/lib/python3.7/http/client.py", line 1229, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/tmp/venv/lib/python3.7/http/client.py", line 1274, in _send_request
    body = _encode(body, 'body')
  File "/tmp/venv/lib/python3.7/http/client.py", line 160, in _encode
    (name.title(), data[err.start:err.end], name)) from None
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 2-5: Body ('工作报告') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.

As described in Detecting the character encoding of an HTTP POST request the default encoding for HTTP POST is ISO-8859-1 aka Latin-1. And as the error message right at the end of the traceback tells you, you can force it by encoding to an UTF-8 bytes string; but then of course your server needs to be expecting UTF-8, too; or you will simply be sending useless Latin-1 mojibake.

There is no way in the POST interface itself to enforce this, but your server could in fact require clients to explicitly specify their content encoding by using the charset parameter; maybe return a specific 5xx error code with an explicit error message if it's missing.

Somewhat less disciplinedly, you could have your server attempt to decode incoming POST requests as UTF-8, and reject the POST if that fails.

Upvotes: 2

Related Questions