ppeddi
ppeddi

Reputation: 217

how to achieve even distribution of partitions keys in Cassandra

We modeled our data in cassandra table with partition key, lets say "pk". We have a total of 100 unique values for pk and our cluster size is 160. We are using random partitioner. When we add data to Cassandra (with replication factor of 3) for all 100 partitions, I noticed that those 100 partitions are not distributed evenly. One node has as many as 7 partitions and lot of nodes only has 1 or no partition. Given that we are using random partitioner, I expected the distribution to be reasonably even. Because 7 partitions are in the same node, thats creating a hot partition for us. Is there a better way to distribute partitions evenly?

Any input is appreciated.

Thanks

Upvotes: 3

Views: 1230

Answers (1)

Jim Meyer
Jim Meyer

Reputation: 9475

I suspect the problem is the low cardinality of your partition key. With only 100 possible values, it's not unexpected that several values end up hashing to the same nodes.

If you have 160 nodes, then only having 100 possible values for your partition key will mean you aren't using all 160 nodes effectively. An even distribution of data comes from inserting a lot of data with a high cardinality partition key.

So I'd suggest you figure out a way to increase the cardinality of your partition key. One way to do this is to use a compound partition key by including some part of your clustering columns or data fields into your partition key.

You might also consider switching to the Murmur3Partitioner, which generally gives better performance and is the current default partitioner on the newest releases. But you'd still need to address the low cardinality problem.

Upvotes: 4

Related Questions