Reputation: 3737
How to check if local file is same as file stored in S3 without downloading it? To avoid downloading large files again and again. S3 objects have e-tags, but they are difficult to compute if file was uploaded in parts and solution from this question doesn't seem to work. Is there some easier way avoid unnecessary downloads?
Upvotes: 8
Views: 6820
Reputation: 13176
If you don't need an immediate inventory, you can generate s3 storage inventory then import them into your database for future usage.
Compute the local file Etag as shown here for normal file and huge multipart file.
Upvotes: 0
Reputation: 52443
I would just compare the last modified time and download if they are different. Additionally you can also compare the size before downloading. Given a bucket
, key
and a local file fname
:
import boto3
import os.path
def isModified(bucket, key, fname):
s3 = boto3.resource('s3')
obj = s3.Object(bucket, key)
return int(obj.last_modified.strftime('%s')) != int(os.path.getmtime(fname))
Upvotes: 6
Reputation: 40904
Can you use a small local database, e.g. a text file?
Next time, before you proceed with downloading, look up the ETag in the 'database'. If it's there, compute the signature of your existing file, and compare with the signature corresponding to the ETag. If they match, the remote file is the same that you have.
There's a possibility that the same file will be re-uploaded with different chunking, thus changing the ETag. Unless this is very probable, you can just ignore the false negative and re-download the file in that rare case.
Upvotes: 2