Reputation: 1307
I have read that Cassandra columns are sorted physically. I felt this is correct if only single row of a key is present in a node(in single SSTable). If same key is there in multiple SSTables with different/Same columns , the node itself has to sort it out after read from each SSTables. If this is correct, how the wide row concept of Cassandra, that is used for column sort/order by purposes will become efficient.
Upvotes: 0
Views: 357
Reputation: 684
You are right that Cassandra keeps rows sorted on disk based on Clustering Columns. This reduces the seeks on disk to satisfy a query.
You are also right that a partition can exist in multiple SSTables on disk, each SSTable will be sorted on disk but when the node reads a partition it merges the values from each sstaqble in memory + any values for that partition in the memtable.
Compaction is designed to minimise the number of SSTables exist to keep the number of disk seeks down. Disk is likely to be slower than merging sorted data.
Upvotes: 1