Cassandra : memory consumption while compacting

Question

I have ParNew GC warnings into system.log that go over 8 seconds pause :

WARN  [Service Thread] GCInspector.java:283 - ParNew GC in 8195ms.  CMS Old Gen: 22316280488 -> 22578261416; Par Eden Space: 1717787080 -> 0; Par Survivor Space: 123186168 -> 214695936

It seems to appear when minor compactions occurs on a particular table :

92128ed0-46fe-11ec-bf5a-0d5dfeeee6e2 ks table 1794583380  1754598812  {1:92467, 2:5291, 3:22510}                                                                  
f6e3cd30-46fc-11ec-bf5a-0d5dfeeee6e2 ks table 165814525   160901558   {1:3196, 2:24814}                                                                           
334c63f0-46fc-11ec-bf5a-0d5dfeeee6e2 ks table 126097876   122921938   {1:3036, 2:24599}

The table :

is configured with LCS strategy.
average row size is 1MB
there are also some wide rows, up to 60MB (from cfhistograms, don't know if it includes or not the LZ4 compression applied on that row ?).

The heap size is 32GB.

Question :

a. how many rows must fit into memory (at once!) during compaction process ? It is just one, or more ?

b. while compacting, does each partition is read in decompressed form into memory, or in compressed form ?

c. do you think the compaction process in my case could fill up all the heap memory ?

Thank you

full GC settings :

-Xms32G
-Xmx32G
#-Xmn800M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways

Aaron · Accepted Answer

a. how many rows must fit into memory (at once!) during compaction process ? It is just one, or more ?

It is definitely multiple.

b. while compacting, does each partition is read in decompressed form into memory, or in compressed form ?

The compression only works at the disk level. Before compaction can do anything with it, it needs to decompress and read it.

c. do you think the compaction process in my case could fill up all the heap memory ?

Yes, the compaction process allocates a significant amount of the heap, and running compactions will cause issues with an already stressed heap.

TBH, I see several opportunities for improvement with the GC settings listed. And right now, I think that's where the majority of the problems are. Let's start with the new gen size:

#-Xmn800M

With CMS you absolutely need to be explicit about your heap new size (Xmn). Especially with a gigantic heap. And yes, with CMS 32GB is "gigantic." The 100MB per CPU core wisdom is incorrect. With Cassandra, the heap new size should be in the range of 25% to 50% of the max heap size (Xmx). For 32GB, I'd say uncomment the Xmn line and set it to -Xmn12G.

So here is how memory is mapped out for CMS:

Now let's look at these two:

-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1

Laid out linearly, the heap is split into a new/young generation, the old generation, and the permanent generation. Major, stop-the-world collections happen on inter-generational promotion (ex: new gen to old gen).

Within the new gen, it is split into the Eden space, and the survivor spaces S0 and S1. What you want, is for all your objects to be created, live, and die in the new gen space. For that to happen, the MaxTenuringThreshold (how many times an object can be copied between survivor spaces) needs to be higher. Also, the survivor spaces need to be big enough to pull their weight. With a ratio of 1:8, each survivor space will be 1/8th of the Eden space. So I'd go with these, just to start:

-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=6

That'll make the survivor spaces bigger, and allow objects to be passed between them 6 times. Hopefully, that's long enough to avoid having to promote them.

Adding these will help, too:

-XX:+AlwaysPreTouch
-XX:+UseTLAB
-XX:+ResizeTLAB
-XX:-UseBiasedLocking

For more info on these ^ check out Amy's Cassandra 2.1 Tuning Guide. But with Cassandra you do want to "pre touch," you do want to enable thread local allocation blocks (TLAB), you do want those blocks to be able to be resized, and you don't want biased locking.

Pick one of your nodes, make these changes, restart, and monitor performance. If they help (which I think they will), add them to the remaining nodes, as well.

tl;dr;

I'd make these changes:

-Xmn12G
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=6
-XX:+AlwaysPreTouch
-XX:+UseTLAB
-XX:+ResizeTLAB
-XX:-UseBiasedLocking

References:

CASSANDRA-8150 - An ultimately unsuccessful attempt to alter the default JVM settings. But the ensuing discussion resulted one of the best compilations of JVM tuning wisdom.
Amy's Cassandra 2.1 Tuning Guide - It may be dated, but this is still one of the most comprehensive admin guides for Cassandra. Many of the settings and approaches discussed are still very relevant.

Cassandra : memory consumption while compacting

Answers (1)

Related Questions