Reputation: 73
We have 14 node cassandra cluster v 3.5. Can someone enlighten with compact & repair ?
Upvotes: 2
Views: 1579
Reputation: 16293
Compactions are part of the normal operation of Cassandra nodes. They run automatically in the background (otherwise known as minor compactions) and get triggered by each table's defined compaction strategy based on any combination of configured thresholds and compaction sub-properties. This video extract from the DS201 Cassandra Foundations course at the DataStax Academy talks about compactions in more detail.
It is not necessary for an operator/administrator to manually kick off compactions with nodetool compact
. In fact, it is not recommended to trigger user-defined compactions (otherwise known as major compactions) because they can create problems down the track like the one I explained in this post -- https://community.datastax.com/questions/6396/.
Repairs on the other hand is something that needs to be managed by a cluster administrator. Since Cassandra has a distributed architecture, it is necessary to run repairs to keep the copies of the data consistent between replicas (Cassandra nodes).
Repairs need to be run at least once every gc_grace_seconds
(configured per table). By default, GC grace is 10 days (864000
seconds) so most DB admins run a repair on each node once a week. This short video from the DS210 Cassandra Operations course provides a good overview of Cassandra repairs.
Running a partitioner range repair (with -pr
flag) on a node repairs only the data that a node owns so it is necessary to run nodetool repair -pr
on each node, one node at a time, until all nodes in the cluster have been repaired. This blog post by Jeremiah Jordan is a good explanation of why this is necessary.
If you're interested, datastax.com/dev has free resources for learning Cassandra. The Cassandra Fundamentals series in particular is a good place to start. It is a collection of short online tutorials where you can quickly learn the basic concepts for free. Cheers!
Upvotes: 4
Reputation: 2310
If I am running from one of node, does this needs to be runs from all the nodes in cluster nodetool compact I see it is very slow, how often this supposed to be run ?
You should not run nodetool compact
command generally. Compactions are by default meant to run automatically behind the scene if not disabled. Running compaction manually may create more problems and should be avoided for most of the cases. Auto compactions which run behind the scene should be able to handle your compactions. If you feel your compactions are slow you can tune your compactions by looking after the parameters related to compaction here (Mostly concurrent_compactors
and compaction_throughput_mb_per_sec
)
Same question regarding nodetool repair ( All nodes or certain nodes in cluster) nodetool repair or nodetool repair -pr how often this supposed to be run ?
Repair is a maintenance task which should be run on all then nodes once before each gc_grace_seconds
period. For example default gc_grace_seconds is equal to 10 days so it is required to run repair on all the nodes once in this 10 day period. You should schedule your repair to run regularly once in gc_grace_seconds period. Regarding which option to use for running repair. If you are doing it by yourself you should run nodetool repair -pr
on all the nodes one by one.
Upvotes: 2