Reputation: 2154
We have a Cosmos DB Collection with around 1 million documents containing user information. Not many additions or updates are done per day. However, we need very high throughput for reading.
Most of the queries will be based on UserId. The UserId property is a numeric value composed of a running number and a check digit.
Based on the official documentation
Some could argue that both, the full UserId and a substring of the UserId (let's say the last 4 digits) could make a good partition key, i.e.
In the future, we might have more than one document per UserId, but let assume no more than 5.
My understanding is that a balance between the number of partitions and the number of documents per partition is also desirable. Thus having 1 document per partition in 1 million partitions is not ideal either. However, on this SO thread, a Microsoft Engineer is suggesting to use the full unique identifier as partition key. (It's worth noting that our case is slightly different, as here the UserId is a running number and not a random GUID). In addition, in the comments of this blog post it's also suggested to use the ID as partition key.
So, considering that: a) this collection will be mostly for read operations, b) we will have between 1 and 2 million UserIds, c) we won't have more than 5 docs per UserId, d) We don't have a requirement of SPs or transactions across multiple users. What Partition Key would be more performant?
Upvotes: 1
Views: 914
Reputation: 2154
Based on @RafatSarosh's comments and further research, I've learned that having millions of partitions and 1 document per partition is not a bad practice, we can rely on Cosmos DB query execution optimisation.
We'll be using the userId as Partition Key.
HTH
Upvotes: 2