Tom
Tom

Reputation: 464

boto - slow set_content_from_file when get_bucket validate is False

I'm trying to upload ~3k files (1 kilobyte each) to boto using GreenPool.

My question:

Why does get_bucket() call takes so long per call, what causes the trade-off with set_content() time? and how can I get around it. Thanks!

More details:

Code:

def upload(bucket_str, key_str, file_path):

    # new s3 connection
    s3 = boto.connect_s3()

    # get bucket
    bucket_time = time.time()
    b = s3.get_bucket (bucket_name, validate=True)
    logging.info('get_bucket Took %f seconds'%(time.time()-bucket_time))

    # get key
    key_time = time.time()
    key = mapping_bucket.new_key(key_str)
    logging.info('new_key Took %f seconds'%(time.time()-key_time))

    for i in range(S3_TRIES):
        try:
            up_time = time.time()
            key.set_contents_from_filename (file_path,
            headers={
                "Content-Encoding": "gzip",
                "Content-Type": "application/json",
            },
            policy='public-read')
            logging.info('set_content Took %f seconds'%(time.time()-up_time))
            key.set_acl('public-read')
            return True

        except Exception as e:
            logging.info('try_set_content exception iteration - %d, %s'%(i, str(e)))
            _e = e

    raise _e

Upvotes: 0

Views: 386

Answers (1)

Frederic Henri
Frederic Henri

Reputation: 53713

you can check the doc for get_bucket

If validate=False is passed, no request is made to the service (no charge/communication delay). This is only safe to do if you are sure the bucket exists.

If the default validate=True is passed, a request is made to the service to ensure the bucket exists. Prior to Boto v2.25.0, this fetched a list of keys (but with a max limit set to 0, always returning an empty list) in the bucket (& included better error messages), at an increased expense. As of Boto v2.25.0, this now performs a HEAD request (less expensive but worse error messages).

After this when you call set_contents_from_filename it needs to open the s3 key for reading so at this time it will make the request to s3.

As back to your question about uploading large number of files, and since you tag your questions with boto3, I would suggest you move to boto3 and look at the Transfer Manager

Upvotes: 1

Related Questions