nebula186
nebula186

Reputation: 149

S3 multiple files download from a bucket

I have a S3 bucket with paths of the form {productId}/{store}/description.txt. Here's what the bucket might look like at the top level

ABC123/Store1/description.txt
ABC123/Store2/description.txt
ABC123/Store3/description.txt
DEF123/Store1/description.txt
DEF123/Store2/description.txt

If i had to read all the files pertaining to a certain product ID (for ex: ABC123) do I have to navigate into ABC123, list all folders and iterate over it for each store and download each file separately? Or is there a way I can do this with a single API call?

PS: I need to do this programmatically

Upvotes: 1

Views: 9310

Answers (2)

Marcin
Marcin

Reputation: 238189

With boto3 you can use filtering and you have to iterate.

There are few ways of doing this, but I usually download the s3 objects in parallel. For example:

import boto3

from multiprocessing import Pool


session = boto3.Session()

s3r = session.resource('s3')

my_bucket = s3r.Bucket('your_bucket')

objects_to_download = []
for obj in my_bucket.objects.filter(Prefix='ABC123/'):    
        objects_to_download.append((my_bucket.name, obj.key))
    
#print(objects_to_download)

def s3_downloader(s3_object_tuple):
    my_bucket, my_object = s3_object_tuple
    s3_object = s3r.Object(my_bucket, my_object)
    out_file = my_object.replace('/', '_')
    print(f'Downloading s3://{my_bucket}/{my_object} to {out_file}')
    s3_object.download_file('/tmp/' + out_file)
    print(f'Downloading finished s3://{my_bucket}/{my_object}')
    
with Pool(5) as p:
    p.map(s3_downloader, objects_to_download)

Upvotes: 3

Bandi-Revanth
Bandi-Revanth

Reputation: 38

I believe it is a limitation of the AWS console web interface, having tried (and failed) to do this myself.

Alternatively, perhaps use a 3rd party S3 browser client such as http://s3browser.com/

If you have Visual Studio with the AWS Explorer extension installed, you can also browse to Amazon S3 (step 1), select your bucket (step 2), select all the files you want to download (step 3) and right-click to download them all (step 4).

enter image description here

The S3 service has no meaningful limits on simultaneous downloads (easily several hundred downloads at a time are possible) and there is no policy setting related to this... but the S3 console only allows you to select one file for downloading at a time.

Once the download starts, you can start another and another, as many as your browser will let you attempt simultaneously.

In case someone is still looking for an S3 browser and downloader I have just tried Filezilla Pro (it's a paid version). It worked great.

I created a connection to S3 with the Access key and secret key set up via IAM. The connection was instant and downloading all folders and files was fast.

Upvotes: 0

Related Questions