Reputation: 54
I have six nodes Cassandra cluster, which host a large columnfamily (cql table) that is immuable (because it's a kind of an history table from an application point of view). Such table is about 400Go of compressed data, which is not that much!
So after truncating the table, then ingest the app history data in it, I trigger nodetool compact on it on each node, in order to have the best read performance, by reducing down the number of SSTables. The compaction strategy is STCS.
After running nodetool compact, I trigger nodetool compactionstats to follow the compaction progress :
id compaction type keyspace table completed total unit progress
xxx Compaction mykeyspace mytable 3.65 GiB 1.11 TiB bytes 0.32%
After hours I have on that same node :
id compaction type keyspace table completed total unit progress
xxx Compaction mykeyspace mytable 4.08 GiB 1.11 TiB bytes 0.36%
So the compaction process seems to work, but it's terribly slow.
Even with nodetool setcompactionthreshold -- 0, the compaction remains terribly slow. Moreover, CPU seems to be used to 100% because of that compaction.
Questions :
Upvotes: 2
Views: 1888
Reputation: 87109
Performance of compaction depends on the underlying hardware - its performance depends on what kind of disks is used, etc. But it also depends on how many compaction threads are allowed to run, and what throughput is configured for compaction threads. From command line compaction throughput is configured by nodetool setcompactionthroughput
, not the nodetool setcompactionthreshold
as you used. And number of concurrent compactors is set with nodetool setconcurrentcompactors
(but it's available in 3.1, IIRC). You can also configure default values in the cassandra.yaml
.
So if you have enough CPU power, and good SSD disks, then you can bump compaction throughput, and number of compactors.
Upvotes: 4