pree
pree

Reputation: 2357

Cassandra - Number of disk seeks in a read request

I'm trying to understand the maximum number of disk seeks required in a read operation in Cassandra. I looked at several online articles including this one: https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlAboutReads.html

As per my understanding, two disk seeks are required in the worst case. One is for reading the partition index and another is to read the actual data from the compressed partition. The index of the data in compressed partitions is obtained from the compression offset tables (which is stored in memory). Am I on the right track here? Will there ever be a case when more than 1 disk seek is required to read the data?

Upvotes: 2

Views: 192

Answers (1)

pree
pree

Reputation: 2357

I'm posting the answer here which I received from Cassandra user community thread in case someone else needs it:

youre right – one seek with hit in the partition key cache and two if not.
Thats the theory – but two thinge to mention:

First, you need two seeks per sstable not per entire read. So if you data is spread over multiple sstables on disk you obviously need more then two reads. Think of often updated partition keys – in combination with memory preassure you can easily end up with maaany sstables (ok they will be compacted some time in the future).

Second, there could be fragmentation on disk which leads to seeks during sequential reads.

Note: Each SSTable has it's own partition index.

Upvotes: 0

Related Questions