Azure Cosmos DB - incorrect and variable document count

Question

I have inserted exactly 1 million documents in an Azure Cosmos DB SQL container using the Bulk Executor. No errors were logged. All documents share the same partition key. The container is provisioned for 3,200 RU/s, unlimited storage capacity and single-region write.

When performing a simple count query:

select value count(1) from c where c.partitionKey = @partitionKey

I get varying results varying from 303,000 to 307,000.

This count query works fine for smaller partitions (from 10k up to 250k documents).

What could cause this strange behavior?

Jay Gong · Accepted Answer

It's reasonable in cosmos db. Firstly, what you need to know is that Document DB imposes limits on Response page size. This link summarizes some of those limits: Azure DocumentDb Storage Limits - what exactly do they mean?

Secondly, if you want to query large data from Document DB, you have to consider the query performance issue, please refer to this article:Tuning query performance with Azure Cosmos DB.

By looking at the Document DB REST API, you can observe several important parameters which has a significant impact on query operations : x-ms-max-item-count, x-ms-continuation.

So, your error is resulted of bottleneck of RUs setting. The count query is limited by the number for RUs allocated to your collection. The result that you would have received will have a continuation token.

You may have 2 solutions:

1.Surely, you could raise the RUs setting.

2.For cost, you could keep looking for next set of results via continuation token and keep on adding it so that you will get total count.(Probably in sdk)

You could set value of Max Item Count and paginate your data using continuation tokens. The Document Db sdk supports reading paginated data seamlessly. You could refer to the snippet of python code as below:

q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})

I imported exactly 30k documents into my database.Then I tried to run the query

select value count(1) from c in Query Explorer. It turns out only partial of total documents every page. So I need to add them all by clicking Next Page button.

Surely, you could do this query in the sdk code via continuation token.

Azure Cosmos DB - incorrect and variable document count

Answers (1)

Related Questions