Reputation: 464
I'm trying to upload ~3k files (1 kilobyte each) to boto using GreenPool.
My question:
Why does get_bucket()
call takes so long per call, what causes the trade-off with set_content()
time? and how can I get around it. Thanks!
More details:
get_bucket(validate=True)
takes 30 seconds in average, and the following set_content_from_file_name
takes under 1 sec.
I tried changing to validate=False
, this successfully reduced get_bucket()
time to under 1 sec, but then the time for set_content_from_file_name
jumped up to ~30 seconds. I couldn't find the reason for this trade-off in the boto docs.
Code:
def upload(bucket_str, key_str, file_path):
# new s3 connection
s3 = boto.connect_s3()
# get bucket
bucket_time = time.time()
b = s3.get_bucket (bucket_name, validate=True)
logging.info('get_bucket Took %f seconds'%(time.time()-bucket_time))
# get key
key_time = time.time()
key = mapping_bucket.new_key(key_str)
logging.info('new_key Took %f seconds'%(time.time()-key_time))
for i in range(S3_TRIES):
try:
up_time = time.time()
key.set_contents_from_filename (file_path,
headers={
"Content-Encoding": "gzip",
"Content-Type": "application/json",
},
policy='public-read')
logging.info('set_content Took %f seconds'%(time.time()-up_time))
key.set_acl('public-read')
return True
except Exception as e:
logging.info('try_set_content exception iteration - %d, %s'%(i, str(e)))
_e = e
raise _e
Upvotes: 0
Views: 386
Reputation: 53713
you can check the doc for get_bucket
If
validate=False
is passed, no request is made to the service (no charge/communication delay). This is only safe to do if you are sure the bucket exists.If the default
validate=True
is passed, a request is made to the service to ensure the bucket exists. Prior to Boto v2.25.0, this fetched a list of keys (but with a max limit set to 0, always returning an empty list) in the bucket (& included better error messages), at an increased expense. As of Boto v2.25.0, this now performs a HEAD request (less expensive but worse error messages).
After this when you call set_contents_from_filename
it needs to open the s3 key for reading so at this time it will make the request to s3.
As back to your question about uploading large number of files, and since you tag your questions with boto3, I would suggest you move to boto3 and look at the Transfer Manager
Upvotes: 1