Cassandra access algorithm (and cost) of clustering key

Question

In most of the documentation about Cassandra I read that the tables can be thought as:

Map>

So I expect the access by PrimaryKey to be something like O(log n). But what about the ClustringKey (obviously when a PrimaryKey is also specified in the query)?

I mean: searching for

where primarykey=someval and clusteringkey=some_clustering_val

would be something like O(log n)+O(log n) no matter if the "anotherval" value is at the end or at the beginning of the row based on the clusteringkey ordering?

I can't find proper documentation on how the data is actually fetched from the row...

Alex Ott · Accepted Answer

The read path is described, for example, in the DSE's Architecture guide.

There are multiple things here that affect data access cost (only listing some of them):

Partition is often split between multiple SSTables - when you performed updates, insertion of different clustering keys, etc. It will become the structure as you described only when compaction process moves data into single SSTable
Data is stored in compressed form - you'll need to decompress the whole data block to access to particular clustering column's data

Cassandra access algorithm (and cost) of clustering key

Answers (1)

Related Questions