cjsmith411
cjsmith411

Reputation: 41

Microsoft.Azure.Documents.Client for Azure Cosmos multiple calls

I"m trying to understand why the Microsoft.Azure.Documents.Client makes multiple calls when running a query.

var option = new FeedOptions { EnableCrossPartitionQuery = true, MaxItemCount = 100};

                var myobj = cosmosClient.CreateDocumentQuery<myCosmosObj>(documentUri, option)
                    .Where(x => x.ID == request.Id);


                while (myobj.AsDocumentQuery().HasMoreResults)
                {
                    var results = await myobj.AsDocumentQuery().ExecuteNextAsync<myCosmosObj>();

                    resultList.AddRange(results);
                }

A Fiddler trace shows 5 calls to the cosmos collection dbs/mycollectionname/colls/docs (the while loop above runs 5 times)

My question is 1 network hop would improve performance, so I would like to understand why it is making 5 network calls, and If there is something I need to do with the configuration to adjust this. I have already tried adjusting the ResultSize. This is roughly a 3GB collection.

Upvotes: 1

Views: 445

Answers (2)

Nick Chapsas
Nick Chapsas

Reputation: 7200

David's answer is theoretically correct however it is missing a crucial point.

Your code is wrong. The way your create the document query inside the loop means that you will always query the result of the first execution 5 times.

The code should actually be like this:

var query = cosmosClient.CreateDocumentQuery<myCosmosObj>(documentUri, option)
        .Where(x => x.ID == request.Id).AsDocumentQuery();

while (query.HasMoreResults)
{
    var results = await query.ExecuteNextAsync<myCosmosObj>();
    resultList.AddRange(results);
}

This will now properly run your query and it will use the continuation properties of the query object in order to read the next page in ExecuteNextAsync.

Upvotes: 2

David Makogon
David Makogon

Reputation: 71035

With a partitioned collection, the most efficient way to find a document by id is by also specifying the partition key (which then directs your query to a single partition). Without PK, there's really no way to know, up front, which partition your documents will reside in. And that's likely why you're seeing 5 calls (you likely have 5 partitions).

The alternative, which your code shows, is to do a cross-partition query, which has to do one query per partition, to seek the document you're looking for.

One more thing to note: A query will have higher RU cost than a Read. And if you already know the partition key and id, there's no need to invoke the query engine (as you can only retrieve a single document anyway, for a given partition key + row key combination).

Upvotes: 1

Related Questions