Aswin Srinivasan
Aswin Srinivasan

Reputation: 13

sub partitioning or composite partitioning document db

In one article of msdn, https://azure.microsoft.com/en-in/documentation/articles/documentdb-partition-data/, there is a line which specifies that "sub-partitioning" or "complex partitioning" can be done. Does this mean :

  1. There can be sub-partitioning inside a collection?
  2. In a single DocumentDb, there can be more than one partitioning logic? For example, I will have four collections inside a single Document Db. Can two of them can be based on hash and the other two on range?

If either of those answers is YES, then can someone provide me a link that might lead me to an example of the same?

Upvotes: 1

Views: 228

Answers (1)

Larry Maccherone
Larry Maccherone

Reputation: 9523

Answers:

  1. There is no explicit method to sub-partition data within a collection. It's common to use a field to represent the type of document or to have isTypeA: true key value pairs on each document, but that's a convention that your application adopts. However, you can create multiple databases (default limit 5 but may be extended upon request) per account and each can have their own set of collections. I'm using that two-level hierarchy in (temporalize-api). TenantID determines my top-level partitioning (database) using a lookup table plus defaults. This allows me to pull critical or high value tenants into a less loaded database and leave everyone else in the default. I use a consistent hash on the EntityID for second-level partitioning (collection).

  2. Sure, there is nothing preventing you from doing that. Pay particular attention to the excellent discussion in the last section (Developing a partitioned application) in the Aravind article you linked to. It includes a checklist of things you'll need to decide upon and implement. The partition resolvers provided for the .NET SDK do not take care of these issues for you.

I haven't yet seen open source examples of what I would consider a complete system including balancing when capacity is added, where to store the partition maps/meta-data, and query fan-out/aggregate optimization. I have a node.js one under way (temporalize-api) and actually in production. I've made decisions about how I'm going to do balancing and query fan-out and those are documented in the comments in that linked file, but I have not implemented all of them. I store the partition meta-data in the "first" collection of the "first" database.

Upvotes: 1

Related Questions