Reputation: 41
I am trying to determine the best partition key for a CosmosDB table that has both a customer ID (unique value for each customer) and customer city (in North America, which yields thousands of possible values).
Reading the Azure documentation, I see a lot of conflicting information between which one is best. Some of the documents specify that the more unique value will provide a better spread of items across partitions. While other documents state that using city would be best.
So my question(s) are:
Is each partition key hashed and does each partition contain items with keys with a range of hashes? Ie - if Customer ID is the partition key, would one partition have ID's 1 through 1000, another partition 1000 through 2000, etc? Same with city, would one partition have multiple cities? Or, would each partition be mapped 1:1 to a specific partition key - ie ID or city?
Based on the above, which one would be better (more performant, cost less)? Having as granular partition key as possible (id customer ID)? Or customer city?
Thank you!
Upvotes: 4
Views: 3072
Reputation: 82096
A good analogy for understanding how partitioning works is to think about finding someone's address:
If I gave you the key to my house (Item ID) but nothing else, you would need to try every door in the world until you happen to stumble upon the right one (aka cross-partition query). If I told you the country (partition key), then you can immediately eliminate a millions of doors, but you'd still have millions of doors to check, so still not very efficient. If I gave you the city, less again but still a lot to check....but if I gave you my postcode, then we've just optimized a query from billions of records to 15-20.
Upvotes: 5