Reputation: 656
I see from Amazon's documentation that writing a new object to S3 is read-after-write consistent, but that update and delete operations are eventually consistent. I would guess that pushing a new version of an object with versioning turned on would be eventually consistent like an update, but I can't find any documentation to confirm. Does anyone know?
Edit: My question is regarding the behavior of a GET with or without an explicit version specified.
I'd really like read-after-write behavior on updates for my project, which I may be able to simulate doing inserts only, but it might be easier if pushing new versions of an object provided the desired behavior.
Upvotes: 4
Views: 2510
Reputation: 91
Specifying version id during get operation is always strongly consistent on versioning enabled objects.
Upvotes: 1
Reputation: 179054
As you already know...
Q: What data consistency model does Amazon S3 employ?
Amazon S3 buckets in all Regions provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES.
...and that's about all there is, as far as official statements on the consistency model.
However, I would suggest that the remainder can be extrapolated with a reasonable degree of certainty from this, along with assumptions we can reasonably make, plus some additional general insights into the inner workings of S3.
For example, we know that S3 does not actually store the objects in a hierarchical structure, yet:
Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index.
http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html
This implies that S3 has at least two discrete major components, a backing store where the data is persisted, and an index of keys pointing to locations in the backing store. We also know that both of then are distributed across multiple availability zones and thus both of them are replicated.
The fact that the backing store is separate from the index is not a foregone conclusion until you remember that storage classes are selectable on a per-object basis, which almost necessarily means that the index and the data are stored separately.
From the fact that overwrite PUT
operations are eventually-consistent, we can conclude that even in a non-versioned bucket, an overwrite is not in fact an overwrite of the backing store, but rather an overwrite of the index entry for that object's key, and an eventual freeing of the space in the backing store that's no longer referenced by the index.
The implication I see in these assertions is that the indexes are replicated and it's possible for a read-after-overwrite (or delete) to hit a replica of the index that does not yet reflect the most recent overwrite... but when a read encounters a "no such key" condition in its local index, the system pursues more resource-intensive path of interrogating the "master" index (whatever that may actually mean in the architecture of S3) to see if such an object really does exist, but the local index replica simply hasn't learned of it yet.
Since the first GET
of a new object that has not replicated to the appropriate local index replica is almost certainly a rare occurrence, it is reasonable to expect that the architects of S3 made this allowance for a higher cost "discovery" operation to improve the user experience, when a node in the system believes this may be the condition it is encountering.
From all of this, I would suggest that the most likely behavior you would experience would be this:
GET
without a versionId on a versioned object after an overwrite PUT
would be eventually-consistent, since the node servicing the read request would not encounter the No Such Key condition, and would therefore not follow the theoretical higher-cost "discovery" model I speculated above.
GET
with an explicit request for the newest versionId would be immediately consistent on an overwrite PUT
, since the reading node would likely launch the high-cost strategy to obtain upstream confirmation of whether its index reflected all the most-current data, although of course the condition here would be No Such Version, rather than No Such Key.
I know speculation is not what you were hoping for, but absent documented confirmation or empirical (or maybe some really convincing anecdotal) evidence to the contrary, I suspect this is the closest we can come to drawing credible conclusions based on the publicly-available information about the S3 platform.
Upvotes: 4
Reputation: 10566
I would not assume anything.
What I would do is capture the versionid (returned in x-amz-version-id header) from the PUT request and issue a GET (or even better HEAD) to ensure that the object was indeed persisted and is visible in S3.
Upvotes: 0