Read latency in Cassandra cluster - too many SSTables

We are facing read latency issues on our Cassandra cluster. One of the reason, I read about, is too many SSTables used in read query. As per documents available online, 1-3 SSTables should be queried for 99%ile read queries. However in my case, we are using upto 20 SSTables.

(I have already worked on tuning other parameters like read-ahead, concurrent-read threads etc)

Here is the output of tablehistogram command for one of the table.

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%            10.00             51.01          43388.63               179                 3
75%            14.00             73.46          62479.63               642                12
95%            17.00            126.93         107964.79              5722               124
98%            20.00            152.32         129557.75             14237               310
99%            20.00            182.79         129557.75             24601               535
Min             0.00             14.24             51.01                51                 0
Max            24.00          74975.55         268650.95          14530764            263210

First, I thought maybe compaction is lagging, but that is not the case. I checked and there are always 0 pending tasks in the output of compactionstatus command. I increased the compaction throughput and concurrent compactors just to be on the safer side.

CPU usage, memory usage, and disk IO/IOPS are under control.

We are using the default compaction strategy. Here are the table metadata.

AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.1
AND default_time_to_live = 7776000
AND gc_grace_seconds = 86400
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99PERCENTILE';

Also, as per compaction history, I can see compaction happening on some tables once in a day, once in 3 days for another table.

Looks like, the SSTable size is not matching to perform the compaction.

Can you please suggest what can be done here to reduce the number of SSTables?

Upvotes: 2

Answers (3)

Erick Ramirez

Reputation: 16393

You need to make sure that your queries are retrieving data from a single partition otherwise they will significantly affect the performance of your cluster.

If your queries target one partition only but still need to retrieve data from 20 SSTables, it indicates to me that you are constantly inserting/updating partitions and the data gets fragmented across multiple SSTables so Cassandra has to retrieve all the fragments and coalesce them to return the results to the client.

If the size of the SSTables are very small (only a few kilobytes) then there's a good chance that your cluster is getting overloaded with writes and nodes are constantly flushing memtables to disk so you end up with tiny SSTables.

If the SSTables are not getting compacted together, it means that the file sizes are widely different. By default, SSTables get merged together if their size in kilobytes are within 0.5-1.5x the average size of SSTables.

Alex Ott's suggestion to reduce min_threshold to 2 will help speed up compactions. But you really need to address the underlying issue. Don't be tempted to run nodetool compact without understanding the consequences and tradeoffs as I've discussed in this post. Cheers!

Upvotes: 1

dor laor

Reputation: 870

You can try to move to level compaction strategy, it's a better one if you have lots of updates. Another option is to force a major compaction. In ScyllaDB we have incremental compaction strategy which combines the best of size tiered and level triggered.

Upvotes: 0

Alex Ott

Reputation: 87299

You can make compaction a bit more aggressive by changing min_threshold parameter of the compaction setting. In the default configuration it's waiting until there are at least 4 files of similar size available, and only after that, trigger compaction. Start with 3, maybe you can lower it to 2, but you really need to track resource consumption so compaction won't add a lot of overhead.

Check this document from the DataStax field team who did a lot of tuning for DataStax customers.

Upvotes: 2

Read latency in Cassandra cluster - too many SSTables

Answers (3)

Related Questions