Reputation: 191
I am running Cassandra 1.0.7, 5 nodes, each node has 8GB physical RAM, and my heap is 4GB. Frequently I have started getting node failures like this:
WARN [ScheduledTasks:1] 2013-04-10 10:18:12,042 GCInspector.java (line 145) Heap is 0.9602098156121341 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-04-10 10:18:12,042 StorageService.java (line 2645) Flushing CFS(Keyspace='Company', ColumnFamily='01_Meta') to relieve memory pressure
WARN [ScheduledTasks:1] 2013-04-10 10:18:14,403 GCInspector.java (line 145) Heap is 0.9610030442856479 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
WARN [ScheduledTasks:1] 2013-04-10 10:18:14,403 StorageService.java (line 2645) Flushing CFS(Keyspace='Company', ColumnFamily='01_Meta') to relieve memory pressure
ERROR [MutationStage:23969] 2013-04-10 10:18:18,339 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[MutationStage:23969,5,main]
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
at org.apache.cassandra.utils.SlabAllocator.allocate(SlabAllocator.java:68)
at org.apache.cassandra.utils.Allocator.clone(Allocator.java:32)
at org.apache.cassandra.db.Column.localCopy(Column.java:244)
at org.apache.cassandra.db.Memtable.resolve(Memtable.java:215)
at org.apache.cassandra.db.Memtable.put(Memtable.java:143)
at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:805)
at org.apache.cassandra.db.Table.apply(Table.java:431)
at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:256)
at org.apache.cassandra.service.StorageProxy$6.runMayThrow(StorageProxy.java:416)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1223)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
The startup parameters are:
/usr/lib/jvm/jdk1.6.0_31/bin/java
-ea
-javaagent:/usr/share/cassandra//lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42
-Xms4G
-Xmx4G heap size
-Xmn200M
-XX:+HeapDumpOnOutOfMemoryError
-Xss128k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.net.preferIPv4Stack=true
-Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-Dcassandra-pidfile=/var/run/cassandra/cassandra.pid
-cp /etc/cassandra/conf:/usr/share/cassandra/lib/antlr-
Any ideas on where to start? I was looking here: http://www.datastax.com/docs/1.0/operations/tuning#tuning-options-for-size-tiered-compaction http://www.datastax.com/docs/1.0/operations/tuning#tuning-java-heap-size
But so far nothing seems out of the ordinary. Any suggestions greatly appreciated.
Upvotes: 1
Views: 2462
Reputation: 733
4GB heap for Cassandra on an 8GB machine seems quite high, you're taking RAM away from the kernel cache and increasing pause times for the GC. I'd expect heap to be more like 2GB.
Indeed if you're deviating from any of the JVM settings in cassandra-env.sh and you don't 100% understand exactly the implications of what you've changed you're already in a world of trouble. If you're doing it without graphing everything out of the JVM and Cassandra you're in even more.
More than that it's near impossible to diagnose memory issues without lots of information, so you'll need to look at your data access patterns very closely. Try answering this questions:
Look around nodetool cfstats for anything out of the ordinary, for example a very wide row that you expect to be skinny or a row taking far more space than you expect.
You should really have graphs of every metric you can pull out of Cassandra and the JVM. I use jmxtrans and graphite for this purpose, these are core tools in my cassandra cluster and the insight I gained from this and consequent data remodelling took me from a 12 node cluster with almost daily outages to a 3 node cluster with no downtime for the past year (and double the traffic), so I can't stress this enough, you need proper trending for production clusters to properly understand, manage and optimise your data access.
Upvotes: 3