Reputation: 11
I was going through the DataStax documentation and found an interesting statement.
It claimed "Insert-heavy workloads are CPU-bound in Cassandra before becoming memory-bound".
Can someone explain about how this claim is made? and what might be causing this behavior of Cassandra??
Thanks.
Upvotes: 1
Views: 511
Reputation: 11100
For different workloads, Cassandra clusters can be CPU, memory, I/O or (occasionally) network bound. The claim in the documentation is, if you start a new cluster and make lots of inserts, the cluster will initially be CPU bound but after a while it becomes bottlenecked on memory.
To process an insert, Cassandra needs to deserialize the messages from the clients, find which nodes should store the data and send messages to those nodes. Those nodes then store the data in an in memory data structure called a Memtable.
This is almost always CPU bound initially. However, as more data is inserted, the memtables grow large and are flushed to disk and new (empty) memtables are created. The flushed memtables are stored in files known as SSTables. There is an ongoing background process called compaction that merges SSTables together into progressively larger and larger files.
There are a few reasons why more memory will help at this stage:
So inserts may become memory bound, but they could also become I/O bound. If there isn't enough I/O to flush memtables then inserts will become blocked once the memtable flush queue is full. So I think the claim could be a bit more accurate:
Insert-heavy workloads are CPU-bound in Cassandra before becoming memory or I/O bound.
Upvotes: 5