Reputation: 89
This is structure of my s3 bucket
Bucket 1
Company A
File A-02/01/20
File A-01/01/20
File B-02/01/20
File B-01/01/20
Company B
File A-02/01/20
File A-01/01/20
I am trying to go to Bucket 1 >> navigate to company A FOLDER and find the latest version of File A and print the modified date, I wanted to do repeat the same steps for File B and then Company B Folder/File A. I am new to S3 and Boto3 so still learning. This is what my code is so far
import boto3
from datetime import datetime, timezone
today = datetime.now(timezone.utc)
s3 = boto3.client('s3', region_name='us-east-1')
objects = s3.list_objects(Bucket='Bucket 1',Prefix = 'Company A'+'/File')
for o in objects["Contents"]:
if o["LastModified"] != today:
print(o["Key"] +" "+ str(o["LastModified"]))
This prints out the following:
File A_2019-10-28.csv 2019-11-11 18:31:17+00:00
File A_2020-01-14.csv 2020-01-14 21:17:46+00:00
File A_2020-01-28.csv 2020-01-29 19:19:58+00:00
But all I want is check File A_2020-01-28.csv and print if !=today, the same with File B
Upvotes: 3
Views: 16886
Reputation: 1300
Assuming that "File A" will always have a date at the end, you could use the 'A' part in the Prefix search. One thing to keep in mind with S3 is that there is no such thing as folders. That is something you imply by using '/' in they key name. S3 just works on Buckets/Keys.
The latest version of that file would be the the version that has the newest last_modified
field. One approach is to sort the object list (of "A" files) on that attribute:
import boto3
from operator import attrgetter
s3 = boto3.client('s3', region_name='us-east-1')
objs = s3.Bucket('Bucket 1').objects.filter(Prefix='Company A/File A')
# sort the objects based on 'obj.last_modified'
sorted_objs = sorted(objs, key=attrgetter('last_modified'))
# The latest version of the file (the last one in the list)
latest = sorted_objs.pop()
As an example: I created foo1.txt, foo2.txt, foo3.txt in order. Then foo10.txt, foo5.txt. foo5.txt is my latest "foo" file.
>>> b.upload_file('/var/tmp/foo.txt','foo10.txt')
>>> b.upload_file('/var/tmp/foo.txt','foo5.txt')
>>> [i.key for i in b.objects.all()] ## no ordering
['foo.txt', 'foo10.txt', 'foo2.txt', 'foo3.txt', 'foo5.txt']
>>> f2 = sorted(b.objects.all(), key=attrgetter('last_modified'))
>>> f2
[s3.ObjectSummary(bucket_name='foobar', key='foo.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo2.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo3.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo10.txt'), s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')]
>>> f2.pop()
s3.ObjectSummary(bucket_name='foobar', key='foo5.txt')
For more details on Python sorting see: https://wiki.python.org/moin/HowTo/Sorting
Upvotes: 3
Reputation: 10097
Almost there, however the if
statement compares 2 different datetime
objects which contain date AND time - the time will differ. If you are after the dates only then change the if
to:
if o["LastModified"].date() != today.date():
Works on Python 3.6.9.
Upvotes: 2