Reputation: 521
We have a 9 node cluster and are running repairs every night as recommended (1 node each night).
We recently started having problems during the repairs, some nodes would die OutOfMemory because the GC would not collect fast enough. In the beginning it was a promotion issue (as shown by the detailed GC logs).
So we assumed that the CMS was not triggered fast enough and prevented ParNew from promoting surviging objects. When then lowered XX:CMSInitiatingOccupancyFraction
from 75 to 50 to force the old GC to trigger faster.
It seemed to work, but yesterday two nodes dies because the GC couldn't cope with the allocation speed, producing this kind of logs :
INFO [ScheduledTasks:1] 2013-09-27 23:36:38,111 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 21756 ms for 1 collections, 8003258240 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:36:38,878 GCInspector.java (line 142) Heap is 0.9746211436302873 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:36:57,018 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 17265 ms for 1 collections, 6587223560 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:36:57,243 GCInspector.java (line 142) Heap is 0.802179208376459 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:37:18,180 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 18437 ms for 1 collections, 6961687392 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:37:18,785 GCInspector.java (line 142) Heap is 0.8477806818323523 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:37:40,416 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 19032 ms for 1 collections, 7338693168 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:37:40,456 GCInspector.java (line 142) Heap is 0.893691708259552 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:38:02,994 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 18853 ms for 1 collections, 7570047632 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:38:03,008 GCInspector.java (line 142) Heap is 0.9218656026318086 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:38:26,110 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 19564 ms for 1 collections, 7714594464 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:38:26,132 GCInspector.java (line 142) Heap is 0.9394682332713986 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:38:49,733 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 20388 ms for 1 collections, 7843428464 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:38:49,748 GCInspector.java (line 142) Heap is 0.9551573859456055 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:39:14,564 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 20956 ms for 1 collections, 7934286376 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:39:14,578 GCInspector.java (line 142) Heap is 0.9662218848591505 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:39:40,186 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 22440 ms for 1 collections, 8008275464 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:39:40,915 GCInspector.java (line 142) Heap is 0.9752321313612954 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:40:01,836 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 19911 ms for 1 collections, 8022614576 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:40:06,032 GCInspector.java (line 142) Heap is 0.976978320390438 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2013-09-27 23:40:27,407 GCInspector.java (line 119) GC for ConcurrentMarkSweep: 22590 ms for 1 collections, 8058828880 used; max is 8211660800
WARN [ScheduledTasks:1] 2013-09-27 23:40:31,091 GCInspector.java (line 142) Heap is 0.9813884275395302 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
INFO [GossipTasks:1] 2013-09-27 23:40:53,798 Gossiper.java (line 799) InetAddress /<datacenter02>.<node2> is now DOWN
INFO [GossipTasks:1] 2013-09-27 23:40:53,846 Gossiper.java (line 799) InetAddress /<datacenter01>.<node3> is now DOWN
INFO [GossipStage:1] 2013-09-27 23:40:53,857 Gossiper.java (line 785) InetAddress /<datacenter01>.<node3> is now UP
INFO [GossipStage:1] 2013-09-27 23:40:53,909 Gossiper.java (line 785) InetAddress /<datacenter02>.<node2> is now UP
This time the heap grows and the GC run for 10-20 seconds without reducing the heap size, causing nodes to think that each other is down because they are busy GCing. In the end the nodes died of OOM.
We then tried to update to the latest version of Cassandra (1.2.8 -> 1.2.10) even though no fixed bug in these versions suggested any improvement for our problem. We then reran a repair during last night, but even though no nodes crashed, they failed to repair some ranges because of GCs of this kind :
INFO [ScheduledTasks:1] 2013-09-29 04:45:05,467 GCInspector.java (line 119) GC for ParNew: 22875 ms for 2 collections, 4128819328 used; max is 8211660800
INFO [ScheduledTasks:1] 2013-09-29 04:53:24,597 GCInspector.java (line 119) GC for ParNew: 133643 ms for 2 collections, 3102634584 used; max is 8211660800
This time it's ParNew taking ridiculous amounts of time.
I first thought of a load issue, but it continued to happen during the w-e when only the repair is happening.
Any help would be appreciated to diagnose / fix our issue.
Upvotes: 3
Views: 2848
Reputation: 2131
Try using the G1 GC instead of the CMS. G1 does not pause like that:
https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
Upvotes: 0
Reputation: 19377
The StatusLogger info doesn't show anything unusual except for GC taking a while. (Are you running on VMs? That tends to reduce GC performance: http://www.slideshare.net/eonnen/high-performance-network-programming-on-the-jvm-oscon-2012/62.)
My guess: repair adds enough load to the system that it falls behind processing requests and spends too much memory buffering them. You can verify this by looking for "dropped" messages in the log. By default it will buffer 10s worth of requests; to reduce this, lower the appropriate rpc timeouts in cassandra.yaml.
Upvotes: 2