user2864894
user2864894

Reputation:

Cassandra and heap size

I'm running a 5 node Cassandra cluster which also happens to run Solr on the 5 nodes. I've ingested and indexed over a billion items and currently this message keeps on being printed on the console;

INFO 10:55:54,360 Unable to reduce heap usage since there are no dirty column families INFO 10:56:03,897 GC for ConcurrentMarkSweep: 538 ms for 1 collections, 2733064112 used; max is 3158310912 WARN 10:56:03,898 Heap is 0.865356257870536 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically

I have 8G per node and I've set the MAX_HEAP_SIZE to 3G in the Cassandra-env.sh.

Could someone please shed some light on how I could resolve this please?

Thanks Majd

Upvotes: 1

Views: 1122

Answers (1)

Jack Krupansky
Jack Krupansky

Reputation: 1021

It may simply be that you need more heap - Add a gig or two and see what happens. OTOH, you may need more system memory for file caching as well - 1 billion Solr-enabled rows seems a lot for an 8 GB system.

Generally, about 40 million to 100 million rows is the maximum capacity for a Solr-enabled DSE node. With 5 nodes and 1 billion rows, your cluster has about 200 million rows per node. Sometimes 200 million can be accommodated on a single node, and sometimes not - flip a coin there.

Also, a higher replication factor effectively increases the number of rows that Cassandra will place on each node. So, divide that 40 million to 100 million row guidance by RF to get a decent target number of rows per node.

In short, you need a much bigger cluster, a minimum of 10 nodes, and maybe as many as 25 nodes. And with an RF of 3 or greater you could need even more nodes.

Upvotes: 4

Related Questions