Reputation: 91
We have a 6 node Cassandra Cluster under heavy utilization. We have been dealing a lot with garbage collector stop the world event, which can take up to 50 seconds in our nodes, in the meantime Cassandra Node is unresponsive, not even accepting new logins.
Extra details:
Any help would be very much appreciated!
Edit 1:
Checking object creation stats, it does not look healthy at all.
Edit 2:
I have tried to use the suggested settings by Chris Lohfink, here is the GC report:
Using CMS suggested settings http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTAtNDk=
Using G1 suggested settings http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTAvOC8tLWdjLmxvZy4wLmN1cnJlbnQtLTE5LTExLTE3
The behavior remains basically the same:
I'm going to get the cfstats output for maximum partition size and tombstones per read asap and edit the post again.
Upvotes: 2
Views: 3171
Reputation: 16420
Without knowing what your existing settings or possible data model problems, heres a guess of some conservative settings to use to try to reduce evacuation pauses from not having enough to-space (check gc logs):
-Xmx12G -Xms12G -XX:+UseG1GC -XX:G1ReservePercent=25 -XX:G1RSetUpdatingPauseTimePercent=5 -XX:MaxGCPauseMillis=500 -XX:-ReduceInitialCardMarks -XX:G1HeapRegionSize=32m
This should also help reduce the pause of the update remember set which becomes an issue and reducing humongous objects, by setting G1HeapRegionSize, which can become a problem depending on data model. Make sure -Xmn is not set.
12Gb with C* is probably more suited for using CMS for what its worth, you can get better throughput certainly. Just need to be careful of fragmentation over time with the rather large objects that can get allocated.
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=55 -XX:MaxTenuringThreshold=3 -Xmx12G -Xms12G -Xmn3G -XX:+CMSEdenChunksRecordAlways -XX:+CMSParallelInitialMarkEnabled -XX:+CMSParallelRemarkEnabled -XX:CMSWaitDuration=10000 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCondCardMark
Most likely theres an issue with data model or your under provisioned though.
Upvotes: 2
Reputation: 798
Have you looked at using Zing? Cassandra situations like these are a classic use case, as Zing fundamentally eliminates all GC-related glitches in Cassandra nodes and clusters.
You can see some details on the how/why in my recent "Understanding GC" talk from JavaOne (https://www.slideshare.net/howarddgreen/understanding-gc-javaone-2017). Or just skip to slides 56-60 for Cassandra-specific results.
Upvotes: 3