SergeyT
SergeyT

Reputation: 51

What is the correct behavior on Cassandra read timeout?

Our setup is 6 nodes, 3 per DC, 3 way replication per DC. We write with EACH_QUORUM. Try reading with LOCAL_QUORUM.

Occasionally, we get read timeouts with an error:

com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra time out during read query at consistency ALL (6 responses were required but only 5 replica responded)

We found that this error actually does not mean what it says CASSANDRA-7947. Instead it means that a read repair was triggered and it failed to complete in time.

When we fail with this exception we actually retry with QURUM read, and that fails later with an identical exception.

What we verified is that the writes and reads are happening on the same DC and not crossDC (if this matters). Also a read follows the write and does not happen in parallel.

Any pointers wrt to how we should handle this?

  1. Should we retry a couple of times with LOCAL_QUORUM?
  2. Should we increase the time out?
  3. Should we jump off a cliff?

Any advice would be greatly appreciated.

The table schema is along the lines:

CREATE TABLE records (
    firstKey text,
    secondKey text,
    data blob,
    PRIMARY KEY (firstKey, secondKey)
) WITH read_repair_chance = 0.0
   AND dclocal_read_repair_chance = 0.1
   AND gc_grace_seconds = 864000
   AND bloom_filter_fp_chance = 0.01
   AND caching = { 'keys' : 'ALL', 'rows_per_partition' : 'NONE' }
   AND comment = ''
   AND compaction = { 'class' : 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold' : 32, 'min_threshold' : 4 }
   AND compression = { 'chunk_length_in_kb' : 64, 'class' : 'org.apache.cassandra.io.compress.LZ4Compressor' }
   AND default_time_to_live = 432000
   AND speculative_retry = '99PERCENTILE'
   AND min_index_interval = 128
   AND max_index_interval = 2048
   AND crc_check_chance = 1.0
   AND cdc = false;

Query that times out:

select * from records where firstKey=XXXXXXX

Upvotes: 1

Views: 1531

Answers (1)

Andrea Nagy
Andrea Nagy

Reputation: 1231

Before you jump off the cliff, what I would do in your case:

1.) Try to read with LOCAL_QUORUM and in case of failure, retry with LOCAL_QUORUM or even with ONE consistency level. QUORUM is stronger than LOCAL_QUORUM, so I do not think that retry with a stronger consistency level would help in this case. You have strong consistency as you write with EACH_QUORUM, so depending on your replication factor - if it is really 3 than your reads should be consistent with ONE, too. https://docs.datastax.com/en/cassandra/3.0/cassandra/dml/dmlConfigConsistency.html

2.) I would also look into how do you fetch your data, for example fetching huge amount of data at once can result in timeouts, too.

3.) If there is nothing to improve and you still experiencing timeouts - even with ONE consistency, then I would suggest to check your cassandra driver, as it might have some driver side timeout parameter and as a last resort I would update the range_request_timeout_in_ms and read_request_timeout_in_ms in cassandra.yaml

Upvotes: 1

Related Questions