coder112
coder112

Reputation: 41

Is a unique ID the best partition key for CosmosDB

I am trying to determine the best partition key for a CosmosDB table that has both a customer ID (unique value for each customer) and customer city (in North America, which yields thousands of possible values).

Reading the Azure documentation, I see a lot of conflicting information between which one is best. Some of the documents specify that the more unique value will provide a better spread of items across partitions. While other documents state that using city would be best.

So my question(s) are:

  1. Is each partition key hashed and does each partition contain items with keys with a range of hashes? Ie - if Customer ID is the partition key, would one partition have ID's 1 through 1000, another partition 1000 through 2000, etc? Same with city, would one partition have multiple cities? Or, would each partition be mapped 1:1 to a specific partition key - ie ID or city?

  2. Based on the above, which one would be better (more performant, cost less)? Having as granular partition key as possible (id customer ID)? Or customer city?

Thank you!

Upvotes: 4

Views: 3072

Answers (1)

James
James

Reputation: 82096

  • yes, partition keys are hashed and those hashes determine where logical partitions are physically stored
  • no, partitions will only ever contain records with the same partition key (that's basically the point, co-locate associated records). So in your example, they would be mapped 1:1
  • cost is irrelevant because you aren't charged for partitions (although they do have a size limit), so the question comes down to performance, and again that all depends on how your application queries the data.

A good analogy for understanding how partitioning works is to think about finding someone's address:

If I gave you the key to my house (Item ID) but nothing else, you would need to try every door in the world until you happen to stumble upon the right one (aka cross-partition query). If I told you the country (partition key), then you can immediately eliminate a millions of doors, but you'd still have millions of doors to check, so still not very efficient. If I gave you the city, less again but still a lot to check....but if I gave you my postcode, then we've just optimized a query from billions of records to 15-20.

Upvotes: 5

Related Questions