Reputation: 1200
I have a large amount of objects in an s3 bucket (10 million +). I began looping through them and processing each but my process got interrupted halfway. I'd like to know if it's possible to restart the loop at a specific object. For example, I would start at the object that has the key "test.jpeg". I really would rather not start the entire process over.
Here is what I have for looping.
for bucket_obj in bucket.objects.filter():
print bucket_obj.key
Upvotes: 1
Views: 1392
Reputation: 1200
for bucket_obj in bucket.objects.filter(Marker="test.jpg"):
print bucket_obj.key
Upvotes: 0
Reputation: 156
In my opinion, you can utlize marker
property of filter
function. You can start your loop from the desired marker
.
If you know where your loop is failing then you can use getMarker
there and start the loop again by using that value. Here is an example.
bucket.listObjects({Prefix: '2015-02', Marker: '2015-02-23-00:00:00'}, callback);
Alternatively, like a bad practice, you can store all the object in a list
or dict
. Mark them visited every time you loop through it and if loop fails then only loop objects which are not visited. This method can take up so much memory as you have millions of objects.
Upvotes: 1