Saulo Ricci
Saulo Ricci

Reputation: 774

Partitioning in DocumentDB

I'm wondering about the partitionkey in a partitioned DocumentDB collection in the following scenario:

  1. Each document C in the collection contains 2 fields: a and b
  2. It's necessary fast lookups for document C using both fields a and b as criteria for the lookups(using them at the sql query where clause).

I believe is necessary somehow specify both fields to accomplish the goal 2. Is there any way I can specify both fields a and b as partition keys for my collection?

If not, is there any alternative solution?

Upvotes: 0

Views: 1389

Answers (2)

PartlyCloudy
PartlyCloudy

Reputation: 701

I think you may have two notions mixed up here - Partitioning and Indexing.

In order to support fast retrievals using both a and b as criteria, you need to have your documents indexed on those fields. Luckily, DocumentDB already indexes for you, so you get fast performance. See https://learn.microsoft.com/en-us/azure/documentdb/documentdb-indexing

Partitioning is a way to split your data, if you have a lot of it, over multiple collections, in order to deal with data going over the single collection limit. When you specify a partition key, documents with the same key will go to the same collection. See https://learn.microsoft.com/en-us/azure/documentdb/documentdb-partition-data

So what logic should you consider when picking a partition key? As a rule of thumb, you want documents that come up together in your queries to be found in the same collection. So for example, if you do a lot of queries that return all documents for a given userId, you may want to partition by user id.

Upvotes: 1

Aravind Krishna R.
Aravind Krishna R.

Reputation: 8003

There are two ways to do this:

  • Pick either a or b(receiver) as the partition key. Since DocumentDB automatically indexes all properties, queries will be executed against a single partition.
  • Create a new property that's the concatenated value of a and b (e.g. from:[email protected];to:[email protected] and use that as the partition key. Then when performing queries, include the new property as a filter in your queries.

The second approach will be more efficient than the first for query by both a and b. If you have a mix of queries with just a or both (a and b), then the first approach is better as both queries will be against a single partition.

But as others mentioned, you will have low latency query responses with either approach, or if you even picked a different partition key like transaction ID. But the approaches above will be the most optimal for a query workload that filters on a and b.

Upvotes: 2

Related Questions