Cassandra - massive variation in read latency for table scans on same node

Question

I'm seeing really odd behavior from a Cassandra 2.x cluster.

If I execute the following query (with CL ONE or LOCAL_ONE)

select count(*) from api_cache;

The query can randomly take 20ms or 800ms on the same node (the table has around 600 rows)

In the first case around 100 sequential scans are executed and in the second around 2000 are executed. I have no idea why the system should exhibit such non-deterministic behavior.

This problem only occurs in DC2 (5 nodes with RF=5) whereas in DC1 (3 nodes RF=3) the same query always returns in around 15ms. The other points worth mentioning are that DC2 was created by streaming from DC1 and the keyspace uses sizeTiered compaction. All the nodes have 16GB of RAM, dedicated SSD for /var/lib/cassandra and are under very light load.

I have tried compacting, repairing, scrubbing etc - all with no effect.

Any help would be massively appreciated,

Thanks.

Here are the CQL tracings for both cases (edit: each from DC1)

UPDATE:

This is good. If I run the query from DC1 with CL=ALL then it returns in around 30ms every time. In other words, querying DC2 from DC1 is an order of magnitude quicker than querying DC2 from DC2!

Bearing in mind the DCs are 100 miles apart, at least we can discount network and disk issues on any of the nodes....

The key difference is that DC1 only ever scans for one range eg [min(-9223372036854775808), min(-9223372036854775808)] whereas DC2 scans for multiple ranges eg (max(4772134901608081021), max(4787315709362044510)].

There is also the the fact that one uses min and the other uses multiple max values...

This is clearly the heart of the problem -- but I don't know how to fix it!!

David Semeria · Accepted Answer

This turned out to be a regression introduced somewhere between 2.0.4 and 2.0.9.

The query planner is no longer able to calculate contiguous ranges for a given node, so instead of issuing one scan for each contiguous range it is frequently scanning each range individually.

See CASSANDRA-7535 for more details.

Cassandra - massive variation in read latency for table scans on same node

Answers (2)

Related Questions