Reputation: 54
I have ParNew GC
warnings into system.log
that go over 8
seconds pause :
WARN [Service Thread] GCInspector.java:283 - ParNew GC in 8195ms. CMS Old Gen: 22316280488 -> 22578261416; Par Eden Space: 1717787080 -> 0; Par Survivor Space: 123186168 -> 214695936
It seems to appear when minor compactions occurs on a particular table
:
92128ed0-46fe-11ec-bf5a-0d5dfeeee6e2 ks table 1794583380 1754598812 {1:92467, 2:5291, 3:22510}
f6e3cd30-46fc-11ec-bf5a-0d5dfeeee6e2 ks table 165814525 160901558 {1:3196, 2:24814}
334c63f0-46fc-11ec-bf5a-0d5dfeeee6e2 ks table 126097876 122921938 {1:3036, 2:24599}
The table :
LCS
strategy.1MB
60MB
(from cfhistograms
, don't know if it includes or not the LZ4 compression applied on that row ?).The heap size
is 32GB.
Question :
a. how many rows must fit into memory (at once!) during compaction process ? It is just one, or more ?
b. while compacting, does each partition is read in decompressed form
into memory, or in compressed form
?
c. do you think the compaction process in my case could fill up all the heap memory ?
Thank you
full GC settings :
-Xms32G
-Xmx32G
#-Xmn800M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSWaitDuration=10000
-XX:+CMSParallelInitialMarkEnabled
-XX:+CMSEdenChunksRecordAlways
Upvotes: 2
Views: 613
Reputation: 57748
a. how many rows must fit into memory (at once!) during compaction process ? It is just one, or more ?
It is definitely multiple.
b. while compacting, does each partition is read in decompressed form into memory, or in compressed form ?
The compression only works at the disk level. Before compaction can do anything with it, it needs to decompress and read it.
c. do you think the compaction process in my case could fill up all the heap memory ?
Yes, the compaction process allocates a significant amount of the heap, and running compactions will cause issues with an already stressed heap.
TBH, I see several opportunities for improvement with the GC settings listed. And right now, I think that's where the majority of the problems are. Let's start with the new gen size:
#-Xmn800M
With CMS you absolutely need to be explicit about your heap new size (Xmn
). Especially with a gigantic heap. And yes, with CMS 32GB is "gigantic." The 100MB per CPU core wisdom is incorrect. With Cassandra, the heap new size should be in the range of 25% to 50% of the max heap size (Xmx
). For 32GB, I'd say uncomment the Xmn
line and set it to -Xmn12G
.
So here is how memory is mapped out for CMS:
Now let's look at these two:
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
Laid out linearly, the heap is split into a new/young generation, the old generation, and the permanent generation. Major, stop-the-world collections happen on inter-generational promotion (ex: new gen to old gen).
Within the new gen, it is split into the Eden space, and the survivor spaces S0 and S1. What you want, is for all your objects to be created, live, and die in the new gen space. For that to happen, the MaxTenuringThreshold
(how many times an object can be copied between survivor spaces) needs to be higher. Also, the survivor spaces need to be big enough to pull their weight. With a ratio of 1:8, each survivor space will be 1/8th of the Eden space. So I'd go with these, just to start:
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=6
That'll make the survivor spaces bigger, and allow objects to be passed between them 6 times. Hopefully, that's long enough to avoid having to promote them.
Adding these will help, too:
-XX:+AlwaysPreTouch
-XX:+UseTLAB
-XX:+ResizeTLAB
-XX:-UseBiasedLocking
For more info on these ^ check out Amy's Cassandra 2.1 Tuning Guide. But with Cassandra you do want to "pre touch," you do want to enable thread local allocation blocks (TLAB), you do want those blocks to be able to be resized, and you don't want biased locking.
Pick one of your nodes, make these changes, restart, and monitor performance. If they help (which I think they will), add them to the remaining nodes, as well.
tl;dr;
I'd make these changes:
-Xmn12G
-XX:SurvivorRatio=2
-XX:MaxTenuringThreshold=6
-XX:+AlwaysPreTouch
-XX:+UseTLAB
-XX:+ResizeTLAB
-XX:-UseBiasedLocking
References:
Upvotes: 2