Reputation: 3470
According to the documentation, it should be possible to do uploads that are not memory intensive, by giving Request a file-like object rather than the contents of the file. Okay, so I do this in the code:
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream', {'Content-Transfer-Encoding':'binary'})}
r = s.post(url, data=content, params=params, files=files, headers=headers)
Watching it run on my computer, with a 2.8 GB file, it starts eating up memory at an alarming rate, before it bails out when it reaches about 89% memory used. It then fails with the following output:
File "***.py", line 644, in post
r = s.post(url, data=content, params=params, files=files, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 110, in request
hooks, stream, verify, cert)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 348, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 286, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 289, in prepare
self.prepare_body(data, files)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 426, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 144, in _encode_files
body, content_type = encode_multipart_formdata(new_fields)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 101, in encode_multipart_formdata
return body.getvalue(), content_type
MemoryError
It works okay with smaller files, but still eats up a substantial amount of memory while doing so. Am I misunderstanding something?
After seeing Martijn Pieters' answer, I changed my code to this:
files = {'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')}
m = requests_toolbelt.MultipartEncoder(fields=files)
headers['content-type'] = m.content_type
r = s.post(url, data=m, params=params, headers=headers)
I had to remove the {'Content-Transfer-Encoding':'binary'}
, because it seemed not to be supported, and led to this error message:
File "***.py", line 647, in post
m = requests_toolbelt.MultipartEncoder(fields=files)
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 89, in __init__
self._prepare_parts()
File "/usr/local/lib/python2.7/dist-packages/requests_toolbelt/multipart/encoder.py", line 171, in _prepare_parts
self.parts = [Part.from_field(f, enc) for f in fields]
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/filepost.py", line 44, in iter_field_objects
yield RequestField.from_tuples(*field)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/fields.py", line 97, in from_tuples
filename, data = value
ValueError: too many values to unpack
(Is there a way to still set this header when using the multipart encoder? I'd much prefer it to be there.)
However, even with removing that header, it's still not working, because now I'm getting this error message:
File "***.py", line 647, in post
r = s.post(url, data=m, params=params, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 424, in post
return self.request('POST', url, data=data, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/core.py", line 114, in request
main_key = self.cache.create_key(response.request)
File "/usr/local/lib/python2.7/dist-packages/requests_cache/backends/base.py", line 156, in create_key
key.update(_to_bytes(request.body))
TypeError: must be convertible to a buffer, not MultipartEncoder
Any ideas? I'll admit that I'm rather new to this and that these error messages are, as they often are in programming, less than helpful.
Upvotes: 2
Views: 7061
Reputation: 163
simple change in way of uploading file
with open('massive-body', 'rb') as f: requests.post('http://some.url/streamed', data=f)
helped
Upvotes: 2
Reputation: 1121844
You are not streaming the upload because requests
can only do that if the whole body is sourced from an open file object. It'll still read all files into memory to build a multi-part post body.
For multi-part uploads, use the requests toolbelt; it includes a Streaming Multipart Data Encoder:
from requests_toolbelt import MultipartEncoder
import requests
files = {
'md5': ('', md5hash),
'modified': ('', now),
'created': ('', now),
'file': (os.path.basename(url), fileobject, 'application/octet-stream')
}
m = MultipartEncoder(fields=dict(files, **params))
headers['content-type'] = m.content_type
r = s.post(url, data=m, headers=headers)
r = requests.post('http://httpbin.org/post', data=m, headers=headers)
The first argument to MultipartEncoder
is parsed with the iter_field_objects()
function from the urllib3
library; this means that it can either be a dictionary of key-value pairs, or a sequence (list, tuple) of RequestField()
objects.
When passing in a dictionary like I did above, each key-value pair is parsed with RequestField.from_tuples()
, and you can only specify the field name, the value, and optionally the filename and the mimetype. Extra headers are not supported. I used that option in the above sample.
If you want to add the Content-Transfer-Encoding
header to the file
field, then we need to use a sequence of RequestField
objects:
from requests.packages.urllib3.fields import RequestField
fields = [RequestField.from_tuples(*p) for p in params.iteritems()]
fields.extend([
RequestField('md5', md5hash),
RequestField('modified', now),
RequestField('created', now),
RequestField(
'file', fileobject, 'application/octet-stream',
{'Content-Transfer-Encoding':'binary'}),
])
Note that you cannot combine streaming requests with the request-cache project; the latter requires access to the full body of the request to produce a cache key.
You'd have to patch the requests_cache.backends.base.BaseCache.create_key
method to handle MultipartEncoder
objects and come up with some kind of hash-key for the body. This is outside the scope of this question however.
Upvotes: 8