Reputation: 149
I have a S3 bucket with paths of the form {productId}/{store}/description.txt. Here's what the bucket might look like at the top level
ABC123/Store1/description.txt
ABC123/Store2/description.txt
ABC123/Store3/description.txt
DEF123/Store1/description.txt
DEF123/Store2/description.txt
If i had to read all the files pertaining to a certain product ID (for ex: ABC123) do I have to navigate into ABC123, list all folders and iterate over it for each store and download each file separately? Or is there a way I can do this with a single API call?
PS: I need to do this programmatically
Upvotes: 1
Views: 9310
Reputation: 238189
With boto3
you can use filtering and you have to iterate.
There are few ways of doing this, but I usually download the s3 objects in parallel. For example:
import boto3
from multiprocessing import Pool
session = boto3.Session()
s3r = session.resource('s3')
my_bucket = s3r.Bucket('your_bucket')
objects_to_download = []
for obj in my_bucket.objects.filter(Prefix='ABC123/'):
objects_to_download.append((my_bucket.name, obj.key))
#print(objects_to_download)
def s3_downloader(s3_object_tuple):
my_bucket, my_object = s3_object_tuple
s3_object = s3r.Object(my_bucket, my_object)
out_file = my_object.replace('/', '_')
print(f'Downloading s3://{my_bucket}/{my_object} to {out_file}')
s3_object.download_file('/tmp/' + out_file)
print(f'Downloading finished s3://{my_bucket}/{my_object}')
with Pool(5) as p:
p.map(s3_downloader, objects_to_download)
Upvotes: 3
Reputation: 38
I believe it is a limitation of the AWS console web interface, having tried (and failed) to do this myself.
Alternatively, perhaps use a 3rd party S3 browser client such as http://s3browser.com/
If you have Visual Studio with the AWS Explorer extension installed, you can also browse to Amazon S3 (step 1), select your bucket (step 2), select all the files you want to download (step 3) and right-click to download them all (step 4).
The S3 service has no meaningful limits on simultaneous downloads (easily several hundred downloads at a time are possible) and there is no policy setting related to this... but the S3 console only allows you to select one file for downloading at a time.
Once the download starts, you can start another and another, as many as your browser will let you attempt simultaneously.
In case someone is still looking for an S3 browser and downloader I have just tried Filezilla Pro (it's a paid version). It worked great.
I created a connection to S3 with the Access key and secret key set up via IAM. The connection was instant and downloading all folders and files was fast.
Upvotes: 0