Reputation: 2981
I am trying to process files in S3
based on the timestamp that these files have. I have this code which provides me the date modified
attribute of the files and I then parse it to convert it into appropriate format using boto.utils.parse_ts
. Now I want to sort the files and if possible put their key name in a list
in sorted order so that the oldest files comes first for processing. How can I do this?
con = S3Connection('', '')
bucket = conn.get_bucket('bucket')
keys = bucket.list('folder1/folder2/')
for key in keys:
date_modified = parse_ts(key.last_modified)
Upvotes: 0
Views: 4442
Reputation: 4043
I used a dictionary and sorted the values. This leaves you with the name and the last_modified if you need it. Otherwise, a simple list is probably faster.
from boto.s3.connection import S3Connection
conn = S3Connection() # assumes region/keys setup in .boto
bucket = conn.get_bucket('mybucket')
dict = {key.name:key.last_modified for key in bucket.get_all_keys()}
dict = sorted(dict.items() key=lambda x: x[1]) # lambda sort order <
ex:
from boto.s3.connection import S3Connection
conn = S3Connection()
bucket = conn.get_bucket('cgseller-test')
dict = {key.name:key.last_modified for key in bucket.get_all_keys()}
print dict
>>> {u'newfolder/else': u'2015-04-01T01:33:43.000Z', u'newfolder/file': u'2015-04-01T01:23:51.000Z', u'newfolder/file1': u'2015-04-01T01:23:42.000Z', u'newfolder/file2': u'2015-04-01T01:23:34.000Z'}
dict = sorted(dict.items(), key=lambda x: x[1])
print dict
>>>[(u'newfolder/file2', u'2015-04-01T01:23:34.000Z'), (u'newfolder/file1', u'2015-04-01T01:23:42.000Z'), (u'newfolder/file', u'2015-04-01T01:23:51.000Z'), (u'newfolder/else', u'2015-04-01T01:33:43.000Z')]
Upvotes: 2
Reputation: 45856
There are probably lots of ways to do this but here's one way that should work:
import boto.s3
conn = boto.s3.connect_to_region('us-east-1')
bucket = conn.get_bucket('mybucket')
keys = list(bucket.list(prefix='folder1/folder2/'))
keys.sort(key=lambda k: k.last_modified)
The variable keys
should now be a list of Key
objects which are sorted by the last_modified
attribute with the oldest first and newest last.
Upvotes: 2