Reputation:
I'm running a 5 node Cassandra cluster which also happens to run Solr on the 5 nodes. I've ingested and indexed over a billion items and currently this message keeps on being printed on the console;
INFO 10:55:54,360 Unable to reduce heap usage since there are no dirty column families INFO 10:56:03,897 GC for ConcurrentMarkSweep: 538 ms for 1 collections, 2733064112 used; max is 3158310912 WARN 10:56:03,898 Heap is 0.865356257870536 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
I have 8G per node and I've set the MAX_HEAP_SIZE to 3G in the Cassandra-env.sh.
Could someone please shed some light on how I could resolve this please?
Thanks Majd
Upvotes: 1
Views: 1122
Reputation: 1021
It may simply be that you need more heap - Add a gig or two and see what happens. OTOH, you may need more system memory for file caching as well - 1 billion Solr-enabled rows seems a lot for an 8 GB system.
Generally, about 40 million to 100 million rows is the maximum capacity for a Solr-enabled DSE node. With 5 nodes and 1 billion rows, your cluster has about 200 million rows per node. Sometimes 200 million can be accommodated on a single node, and sometimes not - flip a coin there.
Also, a higher replication factor effectively increases the number of rows that Cassandra will place on each node. So, divide that 40 million to 100 million row guidance by RF to get a decent target number of rows per node.
In short, you need a much bigger cluster, a minimum of 10 nodes, and maybe as many as 25 nodes. And with an RF of 3 or greater you could need even more nodes.
Upvotes: 4