Reputation: 2658
The other questions I could find were refering to an older version of Boto. I would like to download the latest file of an S3 bucket. In the documentation I found that there is a method list_object_versions() that gets you a boolean IsLatest. Unfortunately I only managed to set up a connection and to download a file. Could you please show me how I can extend my code to get the latest file of the bucket? Thank you
import boto3
conn = boto3.client('s3',
config=Config(signature_version="s3", s3={'addressing_style': 'path'}))
From here I dont know how to get the latest added file from a bucket called mytestbucket
. There are various csv files in the bucket but all of course with a different name.
import boto3
from botocore.client import Config
s3 = boto3.resource('s3', region_name="eu-west-1", endpoint_url="custom endpoint", aws_access_key_id = '1234', aws_secret_access_key = '1234', config=Config(signature_version="s3", s3={'addressing_style': 'path'}))
my_bucket = s3.Bucket('mytestbucket22')
unsorted = []
for file in my_bucket.objects.filter():
files = [obj.key for obj in sorted(unsorted, key=get_last_modified, reverse=True)][0:9]
This gives me the following error:
NameError: name 'get_last_modified' is not defined
Upvotes: 29
Views: 78110
Reputation: 251
This handles when there are more than 1000 objects in the s3 bucket. This is basically @SaadK answer without the for loop and using newer version for list_objects_v2.
EDIT: Fixes issue @Timothée-Jeannin identified. Ensures that latest across all pages is identified.
import boto3
def get_most_recent_s3_object(bucket_name, prefix):
s3 = boto3.client('s3')
paginator = s3.get_paginator( "list_objects_v2" )
page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)
latest = None
for page in page_iterator:
if "Contents" in page:
latest2 = max(page['Contents'], key=lambda x: x['LastModified'])
if latest is None or latest2['LastModified'] > latest['LastModified']:
latest = latest2
return latest
latest = get_most_recent_s3_object(bucket_name, prefix)
latest['Key'] # --> 'prefix/objectname'
Upvotes: 25
Reputation: 12609
I also wanted to download latest file from s3 bucket but located in a specific folder. Use following function to get latest filename using bucket name and prefix (which is folder name).
import boto3
def get_latest_file_name(bucket_name,prefix):
Return the latest file name in an S3 bucket folder.
:param bucket: Name of the S3 bucket.
:param prefix: Only fetch keys that start with this prefix (folder name).
s3_client = boto3.client('s3')
objs = s3_client.list_objects_v2(Bucket=bucket_name)['Contents']
shortlisted_files = dict()
for obj in objs:
key = obj['Key']
timestamp = obj['LastModified']
# if key starts with folder name retrieve that key
if key.startswith(prefix):
# Adding a new key value pair
shortlisted_files.update( {key : timestamp} )
latest_filename = max(shortlisted_files, key=shortlisted_files.get)
return latest_filename
latest_filename = get_latest_file_name(bucket_name='use_your_bucket_name',prefix = 'folder_name/')
Upvotes: 0
Reputation: 1557
If you have a lot of files then you'll need to use pagination as mentioned by helloV. This is how I did it.
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
paginator = s3.get_paginator( "list_objects" )
page_iterator = paginator.paginate( Bucket = "BucketName", Prefix = "Prefix")
for page in page_iterator:
if "Contents" in page:
last_added = [obj['Key'] for obj in sorted( page["Contents"], key=get_last_modified)][-1]
Upvotes: 10
Reputation: 231
You can do
import boto3
s3_client = boto3.client('s3')
response = s3_client.list_objects_v2(Bucket='bucket_name', Prefix='prefix')
all = response['Contents']
latest = max(all, key=lambda x: x['LastModified'])
Upvotes: 23
Reputation: 13447
This is basically the same answer as helloV in the case you use Session
as I'm doing.
from boto3.session import Session
import settings
session = Session(aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
s3 = session.resource("s3")
get_last_modified = lambda obj: int(obj.last_modified.strftime('%s'))
bckt = s3.Bucket("my_bucket")
objs = [obj for obj in bckt.objects.all()]
objs = [obj for obj in sorted(objs, key=get_last_modified)]
last_added = objs[-1].key
Having objs
sorted allows you to quickly delete all files but the latest with
for obj in objs[:-1]:
s3.Object("my_bucket", obj.key).delete()
Upvotes: 3
Reputation: 52443
Variation of the answer I provided for: Boto3 S3, sort bucket by last modified. You can modify the code to suit to your needs.
get_last_modified = lambda obj: int(obj['LastModified'].strftime('%s'))
s3 = boto3.client('s3')
objs = s3.list_objects_v2(Bucket='my_bucket')['Contents']
last_added = [obj['Key'] for obj in sorted(objs, key=get_last_modified)][0]
If you want to reverse the sort:
[obj['Key'] for obj in sorted(objs, key=get_last_modified, reverse=True)][0]
Upvotes: 32
Reputation: 19758
You should be able to download the latest version of the file using default download file command
import boto3
import botocore
BUCKET_NAME = 'mytestbucket'
KEY = 'fileinbucket.txt'
s3 = boto3.resource('s3')
s3.Bucket(BUCKET_NAME).download_file(KEY, 'downloadname.txt')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
Reference link
To get the last modified or uploaded file you can use the following
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('myBucket')
unsorted = []
for file in my_bucket.objects.filter():
files = [obj.key for obj in sorted(unsorted, key=get_last_modified,
As answer in this reference link states, its not the optimal but it works.
Upvotes: 0