srk
srk

Reputation: 5116

What exactly is a partition size in cassandra?

I am new to Cassandra, I have a cassandra cluster with 6 nodes. I am trying to find the partition size,

Tried to fetch it with this basic command

nodetool tablehistograms keyspace.tablename

enter image description here

Now, I am wondering how is it calculated and why the result has only 5 records other than min, max, while the number of nodes are 6. Does node size and number of partitions for a table has any relation?

Fundamentally, what I know is partition key is used to hash and distribute data to be persisted across various nodes

When exactly should we go for bucketing? I am assuming that Cassandra has got a partitioner that take care of distributed persistence across nodes.

Upvotes: 0

Views: 711

Answers (2)

Erick Ramirez
Erick Ramirez

Reputation: 16293

As the name of the command suggests, tablehistograms reports the distribution of metadata for the partitions held by a node.

To add to what Alex Ott has already stated, the percentiles (not percentages) provide an insight on the range of metadata values. For example:

  • 50% of the partitions for the given table have a size of 74KB or less
  • 95% are 263KB or less
  • 98% are 455KB or less

These metadata don't have any correlation with the number of partitions or the number of nodes in your cluster.

You are correct in that the partition key gets hashed and the resulting value determines where the partition (and its associated rows) get stored (distributed among nodes in the cluster). If you're interested, I've explained in a bit more detail with some examples in this post -- https://community.datastax.com/questions/5944/.

As far as bucketing is concerned, you would typically do that to reduce the number of rows in a partition and therefore reducing its size. The general recommendation is to keep your partition sizes less than 100MB for optimal performance but it's not a hard rule -- you can have larger partitions as long as you are aware of the tradeoffs.

In your case, the larges partition is only 455KB so size is not a concern. Cheers!

Upvotes: 1

Alex Ott
Alex Ott

Reputation: 87069

The number of entries in this column is not related to the number of nodes. It shows the distribution of the values - you have min, max, and percentiles (50/75/95/98/99).

Most of the nodetool commands doesn't show anything about other nodes - they are tools for providing information about current node only.

P.S. This document would be useful in explaining how to interpret this information.

Upvotes: 1

Related Questions