AWS s3 : How to list certain objects fast

Question

I am trying to explore if i can list certain objects in less than a second in s3. I have about 200,000 photos in a bucket, some of the photos are related to others: e.g. 6003-01.jpg is related to 6003-02.jpg. I am using this code to extract:

s3_client = boto3.client('s3')
bucket = 'images'
prefix = 'Photo/'
paginator = s3_client.get_paginator('list_objects_v2')
response_iterator = paginator.paginate(Bucket=bucket, Prefix=prefix)
file_names = []
for response in response_iterator:
    for object_data in response['Contents']:
        key = object_data['Key']
        if key.startswith('Photo/6003-'):
            file_names.append(key)
print(file_names)

This code does work but it's far too slow. I know the usual method is to use a DB but I want to see if I can avoid that cost.

Do you know of a quicker way?

Is it possible to make another bucket using this one by script that puts these matched images (6003-) into its own 'directory' and then return all those objects - that would be faster as it would search through a smaller prefix?

Thanks.

John Rotenstein · Accepted Answer

The fastest way is to use Amazon S3 Inventory.

It can provide a daily listing of all the objects in an Amazon S3 bucket in CSV format.

Benefit: No need to list the objects yourself

Disadvantage: It is only provided once per day

Another method of tracking objects is to have Amazon S3 trigger an AWS Lambda function whenever objects are added/deleted. The Lambda function then stores the object information in a database (eg DynamoDB). Then, you query the database rather than S3.

AWS s3 : How to list certain objects fast

Answers (1)

Related Questions