Reputation: 2654
I have 3 node cluster. 2 out of 3 nodes show 100% CPU usage.
Seems We didn't not call repair
and cleanup
after changing consistency level (or we called it too late or it didn't complete)
Now we have 100k plus compaction tasks pending. And they eat 100% CPU.
I tried following
nodetool stop -- COMPACTION
nodetool stop -- INDEX_BUILD
nodetool stop -- VALIDATION
nodetool stop -- CLEANUP
nodetool stop -- SCRUB
No change. No error either.
Only message I got was
No files to compact for user defined compaction
Whats issue ? How can I cancell on going jobs ?
Upvotes: 3
Views: 7122
Reputation: 16420
Calling nodetool stop COMPACTION
would stop current compactions. If you dont want it to start new compactions use nodetool disableautocompaction
. Can then verify with nodetool compactionstats
I am certain that this is not your problem however. With 100k pending compactions you will have too many sstables. Your node is hopelessly behind. Any reads at all will cause massive load. Also unless you have a huge heap, just trying to read from them will likely cause you to run low on heap space and GC issues. The GCs are likely the cause of your high load, if you check your CPU time, if its being spent in IO its likely from reads or streaming, if its in sys/usr its probably GCs. If its a GC issue you can take a heap dump and check to verify whats taking all the space.
With 100k behind your node will probably never recover on its own. Your best bet will be probably be one of:
nodetool disablebinary/disablethrift/disablegossip
then use nodetool compact
to force compact all sstables. Depending on version and compaction strategy it may not work but you can use jmx to change the compaction strategy locally for that node only to STCS to make it work. If this cant be completed in the hinted handoff window its not worth the trouble of trying to make your cluster consistent again. Also this will only work if the load goes down when the node is removed from cluster.Upvotes: 6