Reputation: 5294
I'm trying to perform an S3 sync between prefixes in buckets in different accounts using boto3. My attempt proceeds by listing the objects in the source bucket/prefix in account A, listing the objects in the destination bucket/prefix in account B, and copying those objects in the former that have an ETag not matching the ETag of an object in the latter. This seems like the right way to do it.
But, it seems that even if the copy operation is successful, the ETag of the destination object is different each time I perform a copy. Specifically,
>>> # Here is the source object: {'Key': 'blah/blah/file_20210328_232250.parquet', 'LastModified': datetime.datetime(2021, 3, 28, 23, 38, 2, tzinfo=tzutc()), 'ETag': '"ba230f7a358cf1bee6c98250089da435"', 'Size': 52319, 'StorageClass': 'STANDARD'}
>>> client.copy_object(
CopySource={"Bucket": "source-bucket-in-acct-a", "Key": "blah/blah/file_20210328_232250.parquet"),
Bucket="dest-bucket-in-acct-b",
Key="blah/blah/file_20210328_232250.parquet"
)
... 'CopyObjectResult': {'ETag': '"84f11f744cf996e16a3af0d6d2fbee07"', 'LastModified': datetime.datetime(2021, 4, 20, 2, 23, 40, tzinfo=tzutc())}}
Notice that the ETag has changed. If I run the copy again, it will have yet again a different ETag. I've tried all manner of additional parameters to the copy request (MetadataDirective="COPY"
, etc.). I might be missing a thing that preserves ETag, but my understanding is that ETag is derived from the object's data, not its metadata.
Now, it says in the AWS documentation that the ETags are identical for a successful non-multipart copy operation, which this is, but this does not seem to be the case. It is not a multipart copy and I've checked the actual data; they are identical. Hence, my question:
How can an object's ETag change, if not for an unsuccessful copy?
Upvotes: 7
Views: 5416
Reputation: 91
See: https://teppen.io/2018/10/23/aws_s3_verify_etags/#calculating-an-s3-etag-using-python
Note: if you simply copy the file via AWS s3 web console, partsize is 16MB.
Mickey
Upvotes: 0
Reputation: 238597
Based on the comments.
Calculation of Etag hash for an object is not consistent and can't be fully used for checking integrity of the objects. From AWS blog:
ETag isn't always an MD5 digest, it can't always be used for verifying the integrity of uploaded files.
This is because the calculations of ETag depend on how object was created and encrypted:
Whether the ETag is an MD5 digest depends on how the object was created and encrypted.
Upvotes: 5