userfault
userfault

Reputation: 199

strange python requests message size

I have a strange situation where a request with lighter content ends up consuming more bytes than a request with the larger content.

this is my code:

import requests
import base64

def do_work(image_raw, image_name):
    url = 'http://httpbin.org/post'

    data = {'pic':image_raw}
    res = requests.post(url, data=data, json={'name':image_name})
    print len(image_raw), len(res.content)

    image_enc = base64.b64encode(image_raw)
    res = requests.post(url=url, json={'pic':image_enc, 'name':image_name})
    print len(image_enc), len(res.content)

a typical result is:

68166 925208

90888 182301

the encoded image weighs 33% more than the raw image, which makes perfect sense. but how come the first request weighs much more? There must be something wrong in the way i formatted the request.

Upvotes: 1

Views: 229

Answers (1)

Imperishable Night
Imperishable Night

Reputation: 1533

Note: I'm using Python 3, and my image_raw is a byte object, so the results will be different (probably due to the difference between things like byte, unicode and str), but the explanation should be applicable.

It helps to print out a part of the actual res.content. In the first case:

>>> res.content[:200]
b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "pic": "\\ufffdPNG\\r\\n\\u001a\\n\\u0000\\u0000\\u0000\\rIHDR\\u0000\\u0000\\u0001\\u000f\\u0000\\u0000\\u0000\\u0019\\b\\u0002\\u0000\\u0000\\u0000\\ufffd\\uf'

In the second case:

>>> res.content[:200]
b'{\n  "args": {}, \n  "data": "", \n  "files": {}, \n  "form": {\n    "name": "1.png", \n    "pic": "iVBORw0KGgoAAAANSUhEUgAAAQ8AAAAZCAIAAACgtTxbAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAc'

As you can see, res.content, the string returned by http://httpbin.org/post, is actually a human-readable representation of the POST data. In the first case, the POST data contains many non-ASCII characters, which are encoded in an even more inefficient encoding than base64 -- \uxxxx escape sequences. This is why the length of res.content is longer in the first case.

Upvotes: 1

Related Questions