Paco de la Cruz
Paco de la Cruz

Reputation: 2154

UniqueId or a Substring of UniqueId as Partition Key in Cosmos DB?

We have a Cosmos DB Collection with around 1 million documents containing user information. Not many additions or updates are done per day. However, we need very high throughput for reading.

Most of the queries will be based on UserId. The UserId property is a numeric value composed of a running number and a check digit.

Based on the official documentation

Some could argue that both, the full UserId and a substring of the UserId (let's say the last 4 digits) could make a good partition key, i.e.

In the future, we might have more than one document per UserId, but let assume no more than 5.

My understanding is that a balance between the number of partitions and the number of documents per partition is also desirable. Thus having 1 document per partition in 1 million partitions is not ideal either. However, on this SO thread, a Microsoft Engineer is suggesting to use the full unique identifier as partition key. (It's worth noting that our case is slightly different, as here the UserId is a running number and not a random GUID). In addition, in the comments of this blog post it's also suggested to use the ID as partition key.

So, considering that: a) this collection will be mostly for read operations, b) we will have between 1 and 2 million UserIds, c) we won't have more than 5 docs per UserId, d) We don't have a requirement of SPs or transactions across multiple users. What Partition Key would be more performant?

  1. The Full UserId
  2. A substring of the UserId (e.g. last 4 digits)

Upvotes: 1

Views: 914

Answers (1)

Paco de la Cruz
Paco de la Cruz

Reputation: 2154

Based on @RafatSarosh's comments and further research, I've learned that having millions of partitions and 1 document per partition is not a bad practice, we can rely on Cosmos DB query execution optimisation.

We'll be using the userId as Partition Key.

HTH

Upvotes: 2

Related Questions