S3 multiple files download from a bucket

Question

I have a S3 bucket with paths of the form {productId}/{store}/description.txt. Here's what the bucket might look like at the top level

ABC123/Store1/description.txt
ABC123/Store2/description.txt
ABC123/Store3/description.txt
DEF123/Store1/description.txt
DEF123/Store2/description.txt

If i had to read all the files pertaining to a certain product ID (for ex: ABC123) do I have to navigate into ABC123, list all folders and iterate over it for each store and download each file separately? Or is there a way I can do this with a single API call?

PS: I need to do this programmatically

Marcin · Accepted Answer

With boto3 you can use filtering and you have to iterate.

There are few ways of doing this, but I usually download the s3 objects in parallel. For example:

import boto3

from multiprocessing import Pool


session = boto3.Session()

s3r = session.resource('s3')

my_bucket = s3r.Bucket('your_bucket')

objects_to_download = []
for obj in my_bucket.objects.filter(Prefix='ABC123/'):    
        objects_to_download.append((my_bucket.name, obj.key))
    
#print(objects_to_download)

def s3_downloader(s3_object_tuple):
    my_bucket, my_object = s3_object_tuple
    s3_object = s3r.Object(my_bucket, my_object)
    out_file = my_object.replace('/', '_')
    print(f'Downloading s3://{my_bucket}/{my_object} to {out_file}')
    s3_object.download_file('/tmp/' + out_file)
    print(f'Downloading finished s3://{my_bucket}/{my_object}')
    
with Pool(5) as p:
    p.map(s3_downloader, objects_to_download)

S3 multiple files download from a bucket

Answers (2)

Related Questions