Reputation: 1182
I’ve implemented an AWS lambda using Serverless framework to receive S3 ObjectCreated
event and uncompress tar.gz
files. I’m noticing that copying the extracted files in S3 takes a long time and times out. The .tar.gz
file is ~ 18M in size and number of files in the compressed file is ~ 12000
. I’ve tried using a ThreadPoolExecutor
with 500s
timeout. Any suggestions on how I can work around this issue
The lambda code implemented in python: https://gist.github.com/arjunurs/7848137321148d9625891ecc1e3a9455
Upvotes: 1
Views: 930
Reputation: 38952
In the gist that you have shared, there are a number of changes.
I suggest avoiding reading the extracted tar file in memory where you can stream stream its contents directly to the S3 bucket.
def extract(filename):
upload_status = 'success'
try:
s3.upload_fileobj(
tardata.extractfile(filename),
bucket,
os.path.join(path, tarname, filename)
)
except Exception:
logger.error(
'Failed to upload %s in tarfile %s',
filename, tarname, exc_info=True)
upload_status = 'fail'
finally:
return filename, upload_status
Upvotes: 1