Reputation: 175
I am trying to explore if i can list certain objects in less than a second in s3. I have about 200,000 photos in a bucket, some of the photos are related to others: e.g. 6003-01.jpg is related to 6003-02.jpg. I am using this code to extract:
s3_client = boto3.client('s3')
bucket = 'images'
prefix = 'Photo/'
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)
file_names = []
for response in response_iterator:
for object_data in response['Contents']:
key = object_data['Key']
if key.startswith('Photo/6003-'):
file_names.append(key)
print(file_names)
This code does work but it's far too slow. I know the usual method is to use a DB but I want to see if I can avoid that cost.
Do you know of a quicker way?
Is it possible to make another bucket using this one by script that puts these matched images (6003-) into its own 'directory' and then return all those objects - that would be faster as it would search through a smaller prefix?
Thanks.
Upvotes: 0
Views: 3196
Reputation: 269480
The fastest way is to use Amazon S3 Inventory.
It can provide a daily listing of all the objects in an Amazon S3 bucket in CSV format.
Benefit: No need to list the objects yourself
Disadvantage: It is only provided once per day
Another method of tracking objects is to have Amazon S3 trigger an AWS Lambda function whenever objects are added/deleted. The Lambda function then stores the object information in a database (eg DynamoDB). Then, you query the database rather than S3.
Upvotes: 2