Cassandra node heap pressure during compaction after bulk load

Question

After bulk load of data using sstableloader each Cassandra node ends up with ~3.000 sstables of size ~32MB each.

In an attempt to reduce the no. of sstables I run 'nodetool compact ' on each node.

This compaction puts tremendous pressure on the heap. I tried with 8GB heap (and also 16 GB though I know it is adviced against). In both cases the C* nodes end up doing garbage collection for ~90 secs per sweep. In all, the compaction is not able to complete.

Each machine has 32 GB physical memory. The bulk loaded table uses STCS and caching = 'keys_only'.

This leads to a number of questions:

Why is it that many smaller sstables put more pressure on the heap during compaction than fewer, smaller sstables?
What is the right strategy forward to get these bulk loaded sstables compacted? Adding heap does not seem to solve it.
Is it safe to move some of the sstables away (using linux command mv), run "nodetool compact" with the remaining sstables. And then mv the moved sstables back into their original location and run "nodetool compact" again. (hack, I know).

UPDATE

Actually I have these amounts of sstables - most of similar size. Major compaction cannot complete because of memory shortage. And I cannot find a way to make minor compaction kick in:

enter image description here

Cassandra node heap pressure during compaction after bulk load

Answers (1)

Related Questions