sarit
sarit

Reputation: 139

get zip files from one s3 bucket unzip them to another s3 bucket

I have zip files in one s3 bucket I need to unzip them and copy the unzipped folder to another s3 bucket and keep the source path

for example - if in source bucket the zip file in under

"s3://bucketname/foo/bar/file.zip"

then in destination bucket it should be "s3://destbucketname/foo/bar/zipname/files.."

how can it be done ? i know that it is possible somehow to do it with lambda so i wont have to download it locally but i have no idea how

thanks !

Upvotes: 1

Views: 4784

Answers (3)

nejckorasa
nejckorasa

Reputation: 649

Arguably Python is simpler to use for your Lambda, but if you are considering Java, I've made a library that manages unzipping of data in AWS S3 utilising stream download and multipart upload.

Unzipping is achieved without keeping data in memory or writing to disk. That makes it suitable for large data files - it has been used to unzip files of size 100GB+.

It is available in Maven Central, here is the GitHub link: nejckorasa/s3-stream-unzip

Upvotes: 0

user13067694
user13067694

Reputation:

You can use AWS Lambda for this. You can also set an event notification in your S3 bucket so that a lambda function is triggered everytime a new file arrives. You can write a Python code that uses boto3 to connect to S3. Then you can read files into a buffer, and unzip them using these libraries, gzip them and then reupload to S3 in your desired folder/path:

import gzip
import zipfile
import io

with zipped.open(file, "r") as f_in:
     gzipped_content = gzip.compress(f_in.read())
     destinationbucket.upload_fileobj(io.BytesIO(gzipped_content),
                                                        final_file_path,
                                                        ExtraArgs={"ContentType": "text/plain"}
                                                )

There is also a tutorial here: https://betterprogramming.pub/unzip-and-gzip-incoming-s3-files-with-aws-lambda-f7bccf0099c9

Upvotes: 0

John Rotenstein
John Rotenstein

Reputation: 269490

If your desire is to trigger the above process as soon as the Zip file is uploaded into the bucket, then you could write an AWS Lambda function

When the Lambda function is triggered, it will be passed the name of the bucket and object that was uploaded. The function should then:

  • Download the Zip file to /tmp
  • Unzip the file (Beware: maximum storage available: 500MB)
  • Loop through the unzipped files and upload them to the destination bucket
  • Delete all local files created (to free-up space for any future executions of the function)

For a general example, see: Tutorial: Using AWS Lambda with Amazon S3 - AWS Lambda

Upvotes: 1

Related Questions