Reputation: 739
I have several daemons that read many files from Amazon S3 using boto. Once every couple of days, I'm running into a situation where an httplib.IncompleteRead is thrown out from deep inside boto. If I try and retry the request, it immediately fails with another IncompleteRead. Even if I call bucket.connection.close()
, all further requests will still error out.
I feel like I might've stumbled across a bug in boto here, but nobody else seems to have hit it. Am I doing something wrong? All of the daemons are single-threaded, and I've tried setting is_secure
both ways.
Traceback (most recent call last):
...
File "<file_wrapper.py",> line 22, in next
line = self.readline()
File "<file_wrapper.py",> line 37, in readline
data = self.fh.read(self.buffer_size)
File "<virtualenv/lib/python2.6/site-packages/boto/s3/key.py",> line 378, in read
self.close()
File "<virtualenv/lib/python2.6/site-packages/boto/s3/key.py",> line 349, in close
self.resp.read()
File "<virtualenv/lib/python2.6/site-packages/boto/connection.py",> line 411, in read
self._cached_response = httplib.HTTPResponse.read(self)
File "/usr/lib/python2.6/httplib.py", line 529, in read
s = self._safe_read(self.length)
File "/usr/lib/python2.6/httplib.py", line 621, in _safe_read
raise IncompleteRead(''.join(s), amt)
Environment:
Upvotes: 12
Views: 5262
Reputation: 64358
I've been struggling with this problem for a while, running long-running processes which read large amount of data from S3. I decided to post my solution here, for posterity.
First of all, I'm sure the hack pointed to by @Glenn works, but I chose not to use it because I consider it intrusive (hacking httplib) and unsafe (it blindly returns what it got, i.e. return e.partial
, despite the fact it can be real-error-case).
Here is the solution I finally came up with, which seems to be working.
I'm using this general-purpose retrying function:
import time, logging, httplib, socket
def run_with_retries(func, num_retries, sleep = None, exception_types = Exception, on_retry = None):
for i in range(num_retries):
try:
return func() # call the function
except exception_types, e:
# failed on the known exception
if i == num_retries - 1:
raise # this was the last attempt. reraise
logging.warning(f'operation {func} failed with error {e}. will retry {num_retries-i-1} more times')
if on_retry is not None:
on_retry()
if sleep is not None:
time.sleep(sleep)
assert 0 # should not reach this point
Now, when reading a file from S3, I'm using this function, which internally performs retries in case of IncompleteRead
errors. Upon an error, before retrying, I call key.close()
.
def read_s3_file(key):
"""
Reads the entire contents of a file on S3.
@param key: a boto.s3.key.Key instance
"""
return run_with_retries(
key.read, num_retries = 3, sleep = 0.5,
exception_types = (httplib.IncompleteRead, socket.error),
# close the connection before retrying
on_retry = lambda: key.close()
)
Upvotes: 5
Reputation: 4043
If you are reading a large amount of data from S3, you may have to chunk/multi-part your read/write.
There is a good example here on doing the multi-part (http://www.bogotobogo.com/DevOps/AWS/aws_S3_uploading_large_file.php)
Upvotes: 0
Reputation: 555
It may well be a bug in boto, but the symptoms you describe are not unique to it. See
https://dev.twitter.com/discussions/9554
Since httplib appears in your traceback, one solution is proposed here:
http://bobrochel.blogspot.in/2010/11/bad-servers-chunked-encoding-and.html?showComment=1358777800048
Disclaimer: I have no experience with boto. This is based on research only and posted since there have been no other responses.
Upvotes: 4