user2966197
user2966197

Reputation: 2981

process the files in S3 based on their timestamp using python and boto

I am trying to process files in S3 based on the timestamp that these files have. I have this code which provides me the date modified attribute of the files and I then parse it to convert it into appropriate format using boto.utils.parse_ts. Now I want to sort the files and if possible put their key name in a list in sorted order so that the oldest files comes first for processing. How can I do this?

con = S3Connection('', '')
bucket = conn.get_bucket('bucket')
keys = bucket.list('folder1/folder2/')

for key in keys:
    date_modified = parse_ts(key.last_modified)

Upvotes: 0

Views: 4442

Answers (2)

cgseller
cgseller

Reputation: 4043

I used a dictionary and sorted the values. This leaves you with the name and the last_modified if you need it. Otherwise, a simple list is probably faster.

from boto.s3.connection import S3Connection

conn = S3Connection()  # assumes region/keys setup in .boto
bucket = conn.get_bucket('mybucket')
dict = {key.name:key.last_modified for key in bucket.get_all_keys()}
dict = sorted(dict.items() key=lambda x: x[1]) # lambda sort order <

ex:

from boto.s3.connection import S3Connection
conn = S3Connection()
bucket = conn.get_bucket('cgseller-test')
dict = {key.name:key.last_modified for key in bucket.get_all_keys()}
print dict
>>> {u'newfolder/else': u'2015-04-01T01:33:43.000Z', u'newfolder/file': u'2015-04-01T01:23:51.000Z', u'newfolder/file1': u'2015-04-01T01:23:42.000Z', u'newfolder/file2': u'2015-04-01T01:23:34.000Z'}

dict = sorted(dict.items(), key=lambda x: x[1])
print dict
>>>[(u'newfolder/file2', u'2015-04-01T01:23:34.000Z'), (u'newfolder/file1', u'2015-04-01T01:23:42.000Z'), (u'newfolder/file', u'2015-04-01T01:23:51.000Z'), (u'newfolder/else', u'2015-04-01T01:33:43.000Z')]

Upvotes: 2

garnaat
garnaat

Reputation: 45856

There are probably lots of ways to do this but here's one way that should work:

import boto.s3
conn = boto.s3.connect_to_region('us-east-1')
bucket = conn.get_bucket('mybucket')
keys = list(bucket.list(prefix='folder1/folder2/'))
keys.sort(key=lambda k: k.last_modified)

The variable keys should now be a list of Key objects which are sorted by the last_modified attribute with the oldest first and newest last.

Upvotes: 2

Related Questions