Travis Pettry
Travis Pettry

Reputation: 1352

Cosmos DB Speed Up Reads

I am trying retrieve about 10,000 items from Cosmos DB. It took about 30 seconds to save the data but it is taking about 50 seconds to retrieve it. Each record is about 6KB in size.

string sqlQueryText = $"SELECT * FROM c WHERE c.FK in (1,2,3,4,5,6,7,...N)";
QueryDefinition queryDefinition = new QueryDefinition(sqlQueryText);

FeedIterator<MyObject> myFeedIterator= Container.GetItemQueryIterator<MyObject>(queryDefinition, null, new QueryRequestOptions
{
    PartitionKey = pk,
    MaxConcurrency = 20,
    MaxItemCount = 2000              
});

List<MyObject> myObjects = new List<MyObject>();

while(myFeedIterator.HasMoreResults)
{
    Microsoft.Azure.Cosmos.FeedResponse<MyObject> feedResponses = await myFeedIterator.ReadNextAsync();

    foreach(MyObject feedResponse in feedResponses)
    {
        myObjects.Add(feedResponse);
    }
}

Does anyone know of a way I can speed up this query?

Thank you, Travis Pettry

Upvotes: 0

Views: 1415

Answers (1)

Mark Brown
Mark Brown

Reputation: 8763

Because your item size is so large this query may always suffer from long run times. 10K * 6K is 60MB of data. The page size for each fetch is 4MB so that's going to make 15 round trips to completely drain that query. MaxConcurrency max value is only going to be the number of physical partitions you have so you can just set that to -1. Max items as well is going to be bound by the page size so 4MB/6K will get you about ~660 items in each batch.

As far as performance I would considering the following:

Reevaluate your data model to see if you really need 6KB records. If you do a large volume of reads but only on a subset of data you should shred your documents into two or more documents. This is especially true if you also do a high volume of inserts and especially if you do a high volume of updates because each update, even if only small, will do a replace on that entire 6K of data.

The other best thing you can do would be to remodel your data so that it is not cross-partition. This is especially true if you run this query very frequently or need very fast performance. If you write data that requires you to have the partition key you have today to optimize for writes but also run a high volume of queries, you should consider using change feed and keep two copies of the data, one for writes and another that is used to answer queries.

Upvotes: 2

Related Questions