DarkSpark
DarkSpark

Reputation: 123

Download million files from S3 bucket

I have million of files in different folders on S3 bucket.

The files are very small. I wish to download all the files that are under folder named VER1. The folder VER1 contains many subfolders, I wish to download all the million files under all the subfolders of VER1.

(e.g VER1-> sub1-> file1.txt ,VER1-> sub1 -> subsub1 -> file2.text, etc.)

What is the fastest way to download all the files?

Using s3 cp? s3 sync?

Is there a way to download all the files located under the folder in parallel?

Upvotes: 0

Views: 1684

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269302

Use the AWS Command-Line Interface (CLI):

aws s3 sync s3://bucket/VER1 [name-of-local-directory]

From my experience, it will download in parallel but it won't necessarily use the full bandwidth because there is a lot of overhead for each object. (It is more efficient for large objects, since there is less overhead.)

It is possible that aws s3 sync might have problems with a large number of files. You'd have to try it to see whether it works.

If you really wanted full performance, you could write your own code that downloads in massive parallel, but the time saving would probably be lost in the time it takes you to write and test such a program.

Another option is to use aws s3 sync to download to an Amazon EC2 instance, then zip the files and simply download the zip file. That would reduce bandwidth requirements.

Upvotes: 3

Related Questions