Leo
Leo

Reputation: 1136

Cassandra access algorithm (and cost) of clustering key

In most of the documentation about Cassandra I read that the tables can be thought as:

Map<PrimaryKey, SortedMap<ClusteringKey>>

So I expect the access by PrimaryKey to be something like O(log n). But what about the ClustringKey (obviously when a PrimaryKey is also specified in the query)?

I mean: searching for

where primarykey=someval and clusteringkey=some_clustering_val 

would be something like O(log n)+O(log n) no matter if the "anotherval" value is at the end or at the beginning of the row based on the clusteringkey ordering?

I can't find proper documentation on how the data is actually fetched from the row...

Upvotes: 0

Views: 153

Answers (1)

Alex Ott
Alex Ott

Reputation: 87224

The read path is described, for example, in the DSE's Architecture guide.

There are multiple things here that affect data access cost (only listing some of them):

  • Partition is often split between multiple SSTables - when you performed updates, insertion of different clustering keys, etc. It will become the structure as you described only when compaction process moves data into single SSTable
  • Data is stored in compressed form - you'll need to decompress the whole data block to access to particular clustering column's data

Upvotes: 2

Related Questions