franchyze923
franchyze923

Reputation: 1200

How to loop through s3 bucket starting at a specific object with boto3?

I have a large amount of objects in an s3 bucket (10 million +). I began looping through them and processing each but my process got interrupted halfway. I'd like to know if it's possible to restart the loop at a specific object. For example, I would start at the object that has the key "test.jpeg". I really would rather not start the entire process over.

Here is what I have for looping.

for bucket_obj in bucket.objects.filter():
    print bucket_obj.key

Upvotes: 1

Views: 1392

Answers (2)

franchyze923
franchyze923

Reputation: 1200

for bucket_obj in bucket.objects.filter(Marker="test.jpg"):
    print bucket_obj.key

Upvotes: 0

mihir raj
mihir raj

Reputation: 156

In my opinion, you can utlize marker property of filter function. You can start your loop from the desired marker. If you know where your loop is failing then you can use getMarker there and start the loop again by using that value. Here is an example.

bucket.listObjects({Prefix: '2015-02', Marker: '2015-02-23-00:00:00'}, callback);

Alternatively, like a bad practice, you can store all the object in a list or dict. Mark them visited every time you loop through it and if loop fails then only loop objects which are not visited. This method can take up so much memory as you have millions of objects.

Upvotes: 1

Related Questions