Alex
Alex

Reputation: 3470

Python Requests -- MemoryError despite using streaming uploads

According to the documentation, it should be possible to do uploads that are not memory intensive, by giving Request a file-like object rather than the contents of the file. Okay, so I do this in the code:

files = {'md5': ('', md5hash),
         'modified': ('', now),
         'created': ('', now),
         'file': (os.path.basename(url), fileobject, 'application/octet-stream', {'Content-Transfer-Encoding':'binary'})}
r = s.post(url, data=content, params=params, files=files, headers=headers)

Watching it run on my computer, with a 2.8 GB file, it starts eating up memory at an alarming rate, before it bails out when it reaches about 89% memory used. It then fails with the following output:

  File "***.py", line 644, in post
    r = s.post(url, data=content, params=params, files=files, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 110, in request
    hooks, stream, verify, cert)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 348, in request
    prep = self.prepare_request(req)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 286, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 289, in prepare
    self.prepare_body(data, files)
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 426, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 144, in _encode_files
    body, content_type = encode_multipart_formdata(new_fields)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 101, in encode_multipart_formdata
    return body.getvalue(), content_type
MemoryError

It works okay with smaller files, but still eats up a substantial amount of memory while doing so. Am I misunderstanding something?

EDIT:

After seeing Martijn Pieters' answer, I changed my code to this:

    files = {'md5': ('', md5hash),
             'modified': ('', now),
             'created': ('', now),
             'file': (os.path.basename(url), fileobject, 'application/octet-stream')}
    m = requests_toolbelt.MultipartEncoder(fields=files)
    headers['content-type'] = m.content_type
    r = s.post(url, data=m, params=params, headers=headers)

I had to remove the {'Content-Transfer-Encoding':'binary'}, because it seemed not to be supported, and led to this error message:

  File "***.py", line 647, in post
    m = requests_toolbelt.MultipartEncoder(fields=files)
  File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 89, in __init__
    self._prepare_parts()
  File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 171, in _prepare_parts
    self.parts = [Part.from_field(f, enc) for f in fields]
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 44, in iter_field_objects
    yield RequestField.from_tuples(*field)
  File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/fields.py", line 97, in from_tuples
filename, data = value
ValueError: too many values to unpack

(Is there a way to still set this header when using the multipart encoder? I'd much prefer it to be there.)

However, even with removing that header, it's still not working, because now I'm getting this error message:

  File "***.py", line 647, in post
    r = s.post(url, data=m, params=params, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 114, in request
    main_key = self.cache.create_key(response.request)
  File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 156, in create_key
    key.update(_to_bytes(request.body))
TypeError: must be convertible to a buffer, not MultipartEncoder

Any ideas? I'll admit that I'm rather new to this and that these error messages are, as they often are in programming, less than helpful.

Upvotes: 2

Views: 7061

Answers (2)

RICHA AGGARWAL
RICHA AGGARWAL

Reputation: 163

simple change in way of uploading file

with open('massive-body', 'rb') as f: requests.post('http://some.url/streamed', data=f)

helped

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1121844

You are not streaming the upload because requests can only do that if the whole body is sourced from an open file object. It'll still read all files into memory to build a multi-part post body.

For multi-part uploads, use the requests toolbelt; it includes a Streaming Multipart Data Encoder:

from requests_toolbelt import MultipartEncoder
import requests

files = {
    'md5': ('', md5hash),
    'modified': ('', now),
    'created': ('', now),
    'file': (os.path.basename(url), fileobject, 'application/octet-stream')
}
m = MultipartEncoder(fields=dict(files, **params))
headers['content-type'] = m.content_type

r = s.post(url, data=m, headers=headers)
r = requests.post('http://httpbin.org/post', data=m, headers=headers)

The first argument to MultipartEncoder is parsed with the iter_field_objects() function from the urllib3 library; this means that it can either be a dictionary of key-value pairs, or a sequence (list, tuple) of RequestField() objects.

When passing in a dictionary like I did above, each key-value pair is parsed with RequestField.from_tuples(), and you can only specify the field name, the value, and optionally the filename and the mimetype. Extra headers are not supported. I used that option in the above sample.

If you want to add the Content-Transfer-Encoding header to the file field, then we need to use a sequence of RequestField objects:

from requests.packages.urllib3.fields import RequestField

fields = [RequestField.from_tuples(*p) for p in params.iteritems()]
fields.extend([
    RequestField('md5', md5hash),
    RequestField('modified', now),
    RequestField('created', now),
    RequestField(
        'file', fileobject, 'application/octet-stream',
        {'Content-Transfer-Encoding':'binary'}),
])

Note that you cannot combine streaming requests with the request-cache project; the latter requires access to the full body of the request to produce a cache key.

You'd have to patch the requests_cache.backends.base.BaseCache.create_key method to handle MultipartEncoder objects and come up with some kind of hash-key for the body. This is outside the scope of this question however.

Upvotes: 8

Related Questions