Confusing partitioning key of CosmosDB

Question

I am working my way through the Python examples of CosmosDB (see CosmosDB for Python) and I see a container definition as follows:

    partition_key = PartitionKey(path='/id', kind='Hash')
    db.create_container(id=id, partition_key=partition_key)

Code for reading an item:

response = container.read_item(item=doc_id, partition_key=doc_id)

Now my confusion is why is a partition key chosen which is the same as a unique document id. So, what is the use of partitioning here?

In my opinion, partition is something which applies over keys sharing some common group, for example partition over food groups.

Anupam Chand · Accepted Answer

In my opinion, partition is something which applies over keys sharing some common group, for example partition over food groups.

This is not entirely true. If you look at the documentation, it says that you should choose a partition key that has a high cardinality. In other words, the property should have a wide range of possible values. It should be a value that will not change. You also need to note that if you want to update or delete a document, you will need to pass the partition key.

What happens in the background, is Cosmos can have multiple servers from 1 to infinity. It uses your partition key to logically partition your data. But it is still on one server. If your throughput goes beyond 10K RU or if your storage goes beyond 50GB, Cosmos will automatically split into 2 physical servers. This means your data is split into the 2 servers. The splitting can go on until the max throughput per server is < 10K RU and storage per server is < 50GB. This is how Cosmos can manage infinite scale. You may ask how would you predict which partition a document may go into. The answer is you can't. Cosmos produces a hash using your partition key with a value between 1 and the number of servers.

So the doc id is a good partition key because it is unique and can have a large range of values.

Just be aware that once Cosmos partitions to multiple servers, there is no automatic way currently to bring the number of servers down even if you reduce the storage or reduce the throughout.

Confusing partitioning key of CosmosDB

Answers (2)

Related Questions