cyberabis
cyberabis

Reputation: 330

DocumentDB read latency when queried across partitions

I created 2 empty documentDB collections: 1) with single partition and 2) with multi-partition. Next inserted a single row on both these collections and ran a scan (select * from c). I found that the single partition took up ~2RUs whereas multi-partition took about ~50RUs. It's not just the RU's, but the read latency was about 20x slower with multi-partition. So is it that multi-partition always has high read latency when queried across partitions?

Upvotes: 0

Views: 491

Answers (1)

Aravind Krishna R.
Aravind Krishna R.

Reputation: 8003

You can get the same latency for multi-partition collections as single-partition collections. Let's take the example of scans:

  • If you have non-empty collections, then the performance will be the same as data is read from one of the partitions. Data is read from the first partition, and paginated across partitions in order.
  • If you use the MaxDegreeOfParallelism option, you'll get the same low latencies. Note that query execution is serial by default, in order to optimize for queries with larger datasets. If you use the parallelism option, the query will have the same low latency
  • If you scan with a filter on partition key = value, then you'll get the same performance even without the parallelism.

It is true that there is a small RU overhead for each partition touched during query (~2 RU per partition for query parsing). Note that this doesn't increase with query size, i.e., even if your query returned e.g. 1000 documents, the query will be 1000 + P*2 RUs for partitioned collections in place of 1000 RUs. You can eliminate this overhead of course by including a filter on partition key.

Upvotes: 1

Related Questions