Cassandra vs RDBMS: Clustering columns

Question

Cassandra primary key consists of partition key and clustering columns. Partition key tells which node data is in, and clustering keys decide order on disk. Many read queries or ORDER BY clause don't work if we don't provide clustering columns in correct order.

Role of partition key is clear. Without it all nodes would be looked into, thus impacting the performance. But once we have determined the node using partition key, problem is reduced to finding the records just like in traditional RDBMS,isn't? So why has Cassandra data model made it a bit different, and difficult if I dare to say, by adding concept of clustering columns. Ordering etc can be done in same way as in RDBMS, isn't?

Manish Khandelwal · Accepted Answer

Cassandra does it for performance. Your partitions can go big and to avoid scan within a partition, Cassandra stores data in order sorted by clustering key. You can refer this link for understanding how clustering columns are getting stored.

One more thing, apart from telling on which node the data will be present, partition keys are also useful in determining sstables in which data cannot be present.

Cassandra vs RDBMS: Clustering columns

Answers (1)

Related Questions