sunillp
sunillp

Reputation: 993

cassandra write throughput and scalability

This may sound like a dumb question but still I wanted someone/expert to answer/confirm this.

Lets say I have a 3 node cassandra cluster. Lets say I have one database and just one table. For this single table lets say I get a throughput of 1K writes/second with 3 node cassandra. If tomorrow my write load on this table increases/scales to 10K or 20K, will I be able to handle this write load by increasing the size of cluster by say 10x or 20x?

My understanding of cassandra says it is possible (as cassandra is both read and write scalable) but would want an expert to confirm.

Upvotes: 4

Views: 19190

Answers (4)

Mandraenke
Mandraenke

Reputation: 3266

Yes - but only if your data is properly modeled - your data especially needs to be distributed evenly among your partition keys (since they map to specific replica nodes) to avoid hot spots. Given that, yes cassandra will scale horizontally well.

A "table" in cassandra is distributed among all nodes in your cluster. Each node is responsible for a range of tokens which are hashes of the partition key portion of your primary key.

Now, if you double your node count for example - the existing token ranges are split in half and distributed while bootstrapping the new nodes. So each node will only handle half of your inital requests. If you double your requests afterwards, each node will have roughly the same load as before.

For read intensive requests - choosing a higher replication factor helps when you can live with stale data for a while (e.g. read and write at a low consistency level).

There are good tutorials from DataStax available here https://academy.datastax.com/

Upvotes: 4

Ashraful Islam
Ashraful Islam

Reputation: 12830

Yes, Cassandra has Linear Scalability.

The scalability is linear as shown in the chart below. Each client system generates about 17,500 write requests per second, and there are no bottlenecks as we scale up the traffic. Each client ran 200 threads to generate traffic across the cluster.

enter image description here

Source : https://medium.com/netflix-techblog/benchmarking-cassandra-scalability-on-aws-over-a-million-writes-per-second-39f45f066c9e

Upvotes: 9

S. Stas
S. Stas

Reputation: 810

Yes, it is so, but with the single remark. You should consider replication factor (RF) and consistency level (CL) as they affect the scaling behaviour also.
For example, if you initially have the 10 nodes with RF=3, and you increase the nodes count up to 20 with the same RF=3, you'll get the linear increase in write throughput.
But if you want to increase the read throughput, you need to increase RF. And with the increased RF you had to decrease write consistency level to improve write throughput.
To summarize, you could not increase both read and write throughput in a linear way with the same RF and CL params.

Upvotes: 0

MD Ruhul Amin
MD Ruhul Amin

Reputation: 4502

Datastax states that:

What are the benefits of Apache Cassandra?

Massively scalable ring architecture: Based on the best of Amazon Dynamo and Google BigTable, Cassandra’s peer-to-peer architecture overcomes the limitations of master-slave designs and allows for both high availability and massive scalability.

Linear scale performance: Nodes added to a Cassandra cluster (all done online) increase the throughput of your database in a predictable, linear fashion for both read and write operations.


So the answer is YES, it is possible. It may take some time to adding a new node and redistribute tokens. But it will scale as you change the number of nodes.

If you need more info to understand how it will scale , check this links below:

  1. Benchmarking Cassandra Scalability on AWS
  2. Adding nodes to Cassandra
  3. Adding, replacing, moving and removing nodes

Upvotes: 1

Related Questions