karan
karan

Reputation: 309

Read files in Sequence Python-AWS

I have some folders in S3 bucket in which i have files. Since S3 stores data like a unix system and thus the ordering of folder numbers are 1,10,11,12,2,3 instead of 1,2,3,10,11,12..

I'd like to read folders in sequence 1,2,3,10,11,12.. and then read the files in them..

I have attached a snippet along with a code that i'm trying but it's not working the way i want. As you see the folder name has a number(-0.png-analysis,-1.png-analysis,-10.png-analysis,-11.png-analysis,-2.png-analysis) but the sequencing is incorrect.. Is there a way they can be read in 0,1,2,3,10,11 order?

for i in bucket.objects.all():
    #print(i.key)
    if i.key.endswith('tables.csv'):
        #s = i.key.split('-')[2]
        print(i.key.split('/')[1])
        #print(sorted(s,key = lambda x: x.split('.')))
        #p = i.key.split('-')[2]
        #print(p)

enter image description here

Upvotes: 1

Views: 594

Answers (1)

CaffeinatedCod3r
CaffeinatedCod3r

Reputation: 882

As i said to store all objects using their sequence number as key in a dict and iterating on this dict.

Here's how it would look like


import boto3
import collections

s3 = boto3.client('s3')    
my_dict = {}

for obj in bucket.objects.all():
    if obj.key.endswith('tables.csv'):
        my_dict[int(obj.key.split('/')[1].split('-')[2].split('.')[0])] = obj.key
    
print(my_dict)

od = collections.OrderedDict(sorted(my_dict.items()))

for k,v in od.items():
    csv_obj = s3.get_object(Bucket='bucket', Key=v) 
    print(csv_obj['Body'].read().decode('utf-8'))

NOTE: I assume you don't have any two files which have same sequence as this will only get the latest file with that sequence number and you will not be able to retrieve previous files.

OrderedDict copied from https://stackoverflow.com/a/9001529/9387017

Upvotes: 1

Related Questions