Reputation: 241
I was messing around with the Azure Cosmos DB (via .NET SDK) and noticed something odd.
Normally when I request a query page by page using continuation tokens, I never get documents that were created after the first continuation token had been created. I can observe changed documents, lack of removed (or rather newly filtered out) documents, but not the new ones. However, if I only allow 1kB continuation tokens (the smallest I can set), I get the new documents as well. As long as they end up sorted to the remaining pages, obviously.
This kind of makes sense, since with the size limit, I prevent the Cosmos DB from including the serialized index lookup and whatnot in the continuation token. As a downside, the Cosmos DB has to recreate the resume state for every page I request, what will cost some extra RUs. At least according to this discussion. As a side-effect, new documents end up in the result.
Now, I actually have a couple of questions in regards to this.
Upvotes: 9
Views: 2553
Reputation: 1401
I am from the CosmosDB Engineering Team.
We brought in this feature (limiting continuation token size) due to an ask from customers to help in reducing the response continuation size. We are of the opinion that it's too much detail to expose the effects of pruning the continuation, since for most customers the subtle behavior change shouldn't matter.
This depends on the amount of work done in producing the state from the index. For example, if we had to evaluate a range predicate (e.g. _ts > some discrete second), then the RU saved could be significant, since we potentially avoid scanning a whole bunch of index keys corresponding to _ts (this could be O(number of documents), assuming the worst case of having inserted at most 1 document per second). In this scenario, assuming X continuations, we save (X - 1) * O(number of documents) worth of work.
No, not unless you force CosmosDB to re-evaluate the index every continuation by setting the header to 1. Typically queries are meant to be executed fairly quickly over continuations, so the chance of users seeing new documents should be fairly small. Ideally we should implement snapshot isolation to retrieve results with the session token from the first continuation, but we haven't done this yet.
Your assumptions are spot on :)
Upvotes: 13