Guy Assaf
Guy Assaf

Reputation: 961

How to count all docs, azure DocumentDB

The next SP is an attampt to count all doocuments in the collection, and in general learn how to process the complete collection.

For some reason the next SP return

{"count":0,"QueryCount":0}

while I would expect it to return

{"count":1000, "QueryCount":1}

SP:

   function CountAll(continuationToken) {
    var collection = getContext().getCollection();
    var results =0;
    var queryCount = 0;
    var pageSize = 1000;
    var responseOptionsContinuation;
    var accepted = true;

    var responseOptions = { continuation: continuationToken, pageSize : pageSize};

    if (accepted) {
        accepted = collection.readDocuments(collection.getSelfLink(), responseOptions, onReadDocuments);
        responseOptions.continuation = responseOptionsContinuation;
    }
    setBody();



    function onReadDocuments(err, docFeed, responseOptions) {
        queryCount++;
         if (err) {
            throw 'Error while reading document: ' + err;
        }

        results += docFeed.length;
        responseOptionsContinuation = responseOptions.continuation;
    }

    function setBody() {
        var body = { count: results,  QueryCount: queryCount};
        getContext().getResponse().setBody(body);
    }
}

Upvotes: 1

Views: 998

Answers (2)

Aravind Krishna R.
Aravind Krishna R.

Reputation: 8003

Note that the total count of documents is now returned as a header by DocumentDB. You can perform this as an O(1) operation by calling GET /colls/collectionName (ReadDocumentCollectionAsync in .NET):

The server today returns this information. Unfortunately today SDK doesn’t expose this property. We will fix this in next refresh of SDK. Until then you could try doing this.

ResourceResponse<DocumentCollection> collectionReadResponse = await client.ReadDocumentCollectionAsync(…);
String quotaUsage = collectionReadResponse.ResponseHeaders["x-ms-resource-usage"];

// Quota Usage is a semicolon(;) delimited key-value pair. 
// The key "documentCount" will return the actual count of document.

Here's what the header looks like.

"functions=0;storedProcedures=0;triggers=0;documentSize=10178;documentsSize=5781669;documentsCount=17151514;collectionSize=10422760";

In this example, the count of documents is ~17M (17151514).

Upvotes: 5

Larry Maccherone
Larry Maccherone

Reputation: 9523

You are on the right track. Just need a few tweaks. Your troubles seems to be in the way you are writing async code. It took me a while to get used to writing async code for javascript. I'm sure you'll get it. Here are the things I notice:

  • I don't see anything in your callback onReadDocuments() that will attempt to do another query after it returns with a 1000 document page. Inside of onReadDocuments(), you need to test that the continuation token is not null and that accepted is still true. If both of those conditions are met, then you should execute this statement again, accepted = collection.readDocuments(collection.getSelfLink(), responseOptions, onReadDocuments);

  • Also, inside of onReadDocuments(), this line is probably not doing what you expect, responseOptions.continuation = responseOptionsContinuation; It's unnecessary here because you set it above that and it won't be set to a new value until after the callback is called.

  • Your use of responseOptions as the last parameter of your onReadDocuments() is confusing because it's the request reply headers not the request submission option. Change that to just options.

  • You seem to have three different ways of referring to the continuation token and don't consistently pass in the one that you set. Suggestion, change the parameter to the sproc from continuationToken to continuationTokenForThisSPROCExecution'. You already initialize it into theresponseOptionsso that's good, just update it to the new name. However, inonReadDocuments(), executeresponseOptions.continuation = options.continuation;`

  • Just to be sure you understand, the sproc and call for many 1000-document pages before it times out (at least 10,000 on an unloaded system in my experience). So, you are taking that into account with the changes above but if the sproc times out, you'll need to handle that a bit differently which will involve some work on the client side. You'll need to pass the most recent continuation token back in the body and on the client side, if you see a response with a continuation token, you'll need to call the sproc again (using that continuation token). You'll then either need to also pass the current count back in to the sproc for it to continue to add to it, or you'll need to accumulate it client side.

Here is a fully worked out example in CoffeeScript (which compiles to JavaScript). Note, if you use documentdb-utils, it will continue to call the sproc until done. Otherwise, you'll need to do that yourself.

Upvotes: 1

Related Questions