Reputation: 190
We’re developing a personnel management system based on blazor and Cosmos DB serverless. There will be one customer per database and around 30 “docTypes”. The biggest categories by number and data volume are "users" and "employees". When we query we get all data of users and employee at once. So it can be several thousand. The other doctypes are much smaller an less frequently queried.
The volume of data per customer will not exceed 5 GByte. The most frequent queries are to 3 docTypes.
Would it make more sense to use customerId (so all data is in one partition) or docType as a partition key?
thanks
Upvotes: 0
Views: 1959
Reputation: 164
I would highly recommend you first take a look at this segment of the Data Modelling & Partitioning presentation by Thomas Weiss, Cosmos DB program manager. In my view it's one of the best resources to understand how to think about partitioning.
Do agree with David Makogon that you didn't provide enough data. For instance, we know there are 30 doc types per single database - given cosmosdb database uses containers, I actually expect each docType to have its own container - contrary to what you wrote:
Would it make more sense to use customerId (so all data is in one partition) or docType as a partition key?
Which suggests you want to use a single container for all your data. I wouldn't keep users and employees as documents in the same container. They are separate domains and deserve their own container. See Azure docs page on Partition Strategy and subsequent paragraph about access patterns. The recommendation is to:
Choose a partition key that enables access patterns to be evenly spread across logical partitions.
In the access patterns section, the good practice mentioned is to separate data into hot, medium and cold data and place it into their own containers. One caveat is, that according to this page the max number of containers per database with shared throughput is 25.
If that is not possible, and all data has to end up in a single container, then docType seems to be the right partition key, because your queries will get data by docType if I understood correctly.
As 404 wrote, you want to avoid Hot Partitioning i.e. jamming most of documents in a container into a single or a few logical partitions. Therefore you want to choose a partition key based on most frequent operations.
Upvotes: 2
Reputation: 6157
Based on the information you supplied it sound like docType
is a good property to use as partition key, since it can be used to avoid cross partition queries. Especially since you state this will be often be used in your queries. With the max size you stated it will also be unlikely to cause you issues as a single partition can contain up to 20GB of data.
One thing to watch out for is Hot Partitioning. You state that your users
partition might be a lot bigger than others. That can result into one partition doing all of the lifting while the others sit mostly idle which results and causing inefficiëncy of your total throughput.
On the other side it won't really matter for your use case. Since none of the databases will exceed that 5GB you'll always stay within a single partition, but it's always good though to think about it beforehand; As situations may change and you end up with a database that does split into partitions.
Lastly I would never use a single partition for all data. It has no benefits. If you have no properties that could serve as partition key then id
is the better choice (so a logical partition per document). It won't hit storage limitations and evenly distributes throughput between partitions.
Upvotes: 1