Reputation: 5841
Suppose I update an existing AWS S3 object.
aws s3 mv dataset.csv s3://my-bucket
and then right away, I submit a HEAD request to get the ETag.
aws s3api head-object --bucket my-bucket --key dataset.csv
Am I guaranteed to get the new up-to-date ETag from HEAD, or is there still a chance AWS is still hashing the object even after aws s3 mv
completes locally? More generally, where can I read about AWS ETag timing and availability guarantees?
As suggested in the comments below, I tried supplying my own ETag using the --content-md5
flag of put-object
, but I ran into an error.
$ md5sum file
d41d8cd98f00b204e9800998ecf8427e file
$ aws s3api put-object --bucket my-bucket --key file --body a --content-md5 d41d8cd98f00b204e9800998ecf8427e
An error occurred (InvalidDigest) when calling the PutObject operation: The Content-MD5 you specified was invalid.
Upvotes: 1
Views: 1456
Reputation: 179284
Am I guaranteed to get the new up-to-date ETag from HEAD
A HEAD
request would return the new object ETag if the S3 node handling your request already has a version of the bucket index that is fresh enough that it includes the new object.
When you overwrite an object, S3 doesn't literally overwrite anything -- it stores a copy at a new internal location and then changes its internal an index record to point the internal new object location. The bucket index is replicated and at any instant, each internal index replica lags the primary index by some amount of time, which can be and often is very close to zero, but is not guaranteed to be.
So until the index update has fully propagated, both GET
and HEAD
can return either the old or the new object, fully intact, never corrupt and never partially overwritten.
is there still a chance AWS is still hashing the object
Everything is atomic and immutable relative to the body and the metadata (including ETag). You would never see the new object body with the old ETag. There is no possible lag time there because S3 actually calculates and returns the new ETag as part of the upload response.
If you absolutely need to receive the new version each time you access an object, you have to upload the object with a new object key. This is the only immediate consistency guarantee made by S3. Everything else is eventual; however, ETag and object body won't be out of sync with each other.
Upvotes: 3
Reputation: 3365
You can't control the ETag and you can't even know what it will be before the upload is finished. The --content-md5
is to verify the upload of the object, but then S3 will calculate the final ETag value (ref).
GET and HEAD requests are read-after-write consistent, so if you create a new object and then get it, you'll always get it right. The docs does not mention ETag specifically, but it's part of the GET, so I suppose it is always up-to-date as well. (docs). One caveat is that this consistency model does not hold in two cases:
Upvotes: 1