Reputation: 309
I have some folders in S3 bucket in which i have files. Since S3 stores data like a unix system and thus the ordering of folder numbers are 1,10,11,12,2,3 instead of 1,2,3,10,11,12..
I'd like to read folders in sequence 1,2,3,10,11,12.. and then read the files in them..
I have attached a snippet along with a code that i'm trying but it's not working the way i want. As you see the folder name has a number(-0.png-analysis,-1.png-analysis,-10.png-analysis,-11.png-analysis,-2.png-analysis) but the sequencing is incorrect.. Is there a way they can be read in 0,1,2,3,10,11 order?
for i in bucket.objects.all():
#print(i.key)
if i.key.endswith('tables.csv'):
#s = i.key.split('-')[2]
print(i.key.split('/')[1])
#print(sorted(s,key = lambda x: x.split('.')))
#p = i.key.split('-')[2]
#print(p)
Upvotes: 1
Views: 594
Reputation: 882
As i said to store all objects using their sequence number as key in a dict and iterating on this dict.
Here's how it would look like
import boto3
import collections
s3 = boto3.client('s3')
my_dict = {}
for obj in bucket.objects.all():
if obj.key.endswith('tables.csv'):
my_dict[int(obj.key.split('/')[1].split('-')[2].split('.')[0])] = obj.key
print(my_dict)
od = collections.OrderedDict(sorted(my_dict.items()))
for k,v in od.items():
csv_obj = s3.get_object(Bucket='bucket', Key=v)
print(csv_obj['Body'].read().decode('utf-8'))
NOTE: I assume you don't have any two files which have same sequence as this will only get the latest file with that sequence number and you will not be able to retrieve previous files.
OrderedDict copied from https://stackoverflow.com/a/9001529/9387017
Upvotes: 1