Reputation: 414
We have a couple of tables with Leveled compaction strategy and SizeTiered compaction strategy. How often do we need to run compaction? Thanks in advance
Upvotes: 0
Views: 2066
Reputation: 7305
Compaction runs on its own (as long as you did not disable autocompaction in the yaml).
Per the cassandra write path, we flush memtables to disk periodically into SSTables (sorted string tables) which are immutable. When you update an existing cell, it eventually gets written in an sstable. Possibly a different one than the original record. When we read, sometimes C* has to scan across various sstables (with some optimizations, see bloom filters) to find the latest version of a cell. In Cassandra, last write wins.
Compaction takes sstables and compacts them together removing duplicate data, to optimize reads. This is an automatic operation, though you can tune compactions to run more or less often.
Size tiered compaction is the default compaction strategy in cassandra, it looks for sstables that are the same size and compacts them together when it finds enough (4 by default). Size tiered is less IO intensive than leveled and will work better in general when you have smaller boxes and rotational drives.
Leveled compaction is optimized for reads, when you have read heavy workloads or tight read SLA's with lots of updates leveled may make sense. Leveled compaction is more IO and CPU intensive because you are spending more cycles optimizing for reads, but the reads themselves should be faster and hit fewer SStables. Keep an eye on io wait and on pending compactions in nodetool compaction stats
when you first enable these or if your workload grows.
multi threaded compaction - turn it off, the overhead is bigger than the benefit. To the point where it's been removed in C* 2.1.
concurrent compactors - now defaults to 2, used to default to number of cores which is a bad default. If you're on the 2.0 branch and not running the latest DSE check this default and consider decreasing it to 2. this is the number of simultaneous compaction tasks you can run (different column families).
Compaction throttling - a way of limiting the amount of resources that compactions take up. You can tune this on the fly with nodetool getcompactionthreshold
and nodetool setcompactionthreshold
. You want to tune this to a point where you are not accumulating pending tasks. 0 --> unlimited. Unlimited is, unintuitively, not usually the fastest setting as the system may get bogged down.
Upvotes: 5