Reputation: 6514
Note: I asked a very similar question to this previously, but was not clear enough on exactly what I was looking for, and marked an answer too aggressively. I am looking for a confirmed yes/no on a specific point.
I want to build an automated job that performs offline processing on DocumentDb documents by querying the DocumentDb on a schedule, looking for documents that have changed since the last time the check was performed.
Given the metadata available in DocumentDb, it looks like the way to do this would be the following:
My question is is this guaranteed to work? Is it guaranteed that this will not miss any documents? As far as I can tell, it comes down to the transactional semantics around _ts within DocumentDb's implementation, which is not documented to this level of detail. I want to know if it's guaranteed that no document can be updated with a _ts value that is lower than the largest _ts value returned during a query that returns the most-recently changed document in the collection.
EDIT, prompted by David's comment:
To be a little more precise, with a couple of specific scenarios:
Upvotes: 0
Views: 601
Reputation: 9523
With default consistency this is not guaranteed to work because a document with a lower _ts can show up later. However, if you can guarantee that your update requests were far enough apart (say 60 seconds) then the risk is very low.
I don't think David's edge case is a worry so long as your treat every document with a higher _ts as new.
You might also want to consider an append-only approach using something like Richard Snodgrass' temporal model. That makes the idempotency semantics easier.
Upvotes: 1