James Boutcher
James Boutcher

Reputation: 2613

Reducing Cassandra 1.1.x heap usage

Using Cassandra 1.1.5, have been battling slow write performance, JVM GC lockups, ... in our logs, we see this rather frequently:

 WARN [ScheduledTasks:1] 2013-08-28 09:28:51,983 GCInspector.java (line 145) Heap is 0.8589157615524839 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically

The largest memtable in our system (as observed through JConsole) runs up to about 20,000,000 data size (which I assume is ~20MB, if those are bytes).

If it matters, this column family has almost 1B rows in it.

flush_largest_memtables_at is set to .75, but it seems we hit that almost continuously. The pattern for this table is heavy writes, and very little reads. (essentially a clustered log)

Row cache is disabled, key cache is set to 40MB. We have 8GB of heap associated to the JVM (24GB of physical).

Heap usage is between 6.5 and 7.5GB, mostly.

Advice on what to look at to reduce heap usage here? Surely it's not a factor of how much data we have in the cluster, is it? (We have gobs of disk available across this cluster)

Upvotes: 1

Views: 1461

Answers (3)

Chris Lohfink
Chris Lohfink

Reputation: 16420

We found in 1.1 that lowering the bloom_filter_fp_chance setting helps. If you use

nodetool cfstats 

it helps identify how much it will help from the bloom filter size of your column families. Another thing to consider at the cost of read time is to increase your index_interval in cassandra.yaml. I would recommend this if you have a lot of small rows. If you have wide rows this may not be a good idea.

http://www.datastax.com/docs/1.1/configuration/node_configuration#index-interval

I would recommend taking a heap dump and looking at what the heavy hitters are though.

Upvotes: 0

jbellis
jbellis

Reputation: 19377

The real fix is to upgrade to 1.2.x where the bloom filter and compression metadata have been moved off-heap: http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2

Upvotes: 3

James Boutcher
James Boutcher

Reputation: 2613

Looks like in 1.1.x the Bloom filter (which grows with the amount of data stored in each node) is held on the heap. Our -Filter.db files for a single ColumnFamily were over 1.6GB.

Great article: http://nmmm.nu/bloomfilter.htm

We've modified the bloom_filter_fp_chance setting upwards on this columnfamily (which should reduce the size of the bloom filter data), and are running a scrub to see what happens.

Upvotes: 1

Related Questions