arjunurs
arjunurs

Reputation: 1182

Uploading data from a lambda job to s3 is very slow

I’ve implemented an AWS lambda using Serverless framework to receive S3 ObjectCreated event and uncompress tar.gz files. I’m noticing that copying the extracted files in S3 takes a long time and times out. The .tar.gz file is ~ 18M in size and number of files in the compressed file is ~ 12000. I’ve tried using a ThreadPoolExecutor with 500s timeout. Any suggestions on how I can work around this issue

The lambda code implemented in python: https://gist.github.com/arjunurs/7848137321148d9625891ecc1e3a9455

Upvotes: 1

Views: 930

Answers (1)

Oluwafemi Sule
Oluwafemi Sule

Reputation: 38952

In the gist that you have shared, there are a number of changes.

I suggest avoiding reading the extracted tar file in memory where you can stream stream its contents directly to the S3 bucket.

def extract(filename):
    upload_status = 'success'
    try:
        s3.upload_fileobj(
            tardata.extractfile(filename),
            bucket,
            os.path.join(path, tarname, filename)
        )
    except Exception:
        logger.error(
            'Failed to upload %s in tarfile %s', 
            filename, tarname, exc_info=True)
        upload_status = 'fail'
    finally:
        return filename, upload_status

Upvotes: 1

Related Questions