user3383468
user3383468

Reputation: 161

How cassandra improve performance by adding nodes?

I'm going build apache cassandra 3.11.X cluster with 44 nodes. Each application server will have one cluster node so that application do r/w locally. I have couple of questions running in my mind kindly answer if possible.

1.How many server Ip's should mention in seednode parameter? 2.How HA works when all the mentioned seed node goes down? 3.What is the dis-advantage to mention all the serverIP's in seednode parameter? 4.How cassandra scales with respect to data other than(Primary key and Tunable consistency). As per my assumption replication factor can improve HA chances but not performances. then how performance will increase by adding more nodes?
5.Is there any sharding mechanism in Cassandra.

Upvotes: 1

Views: 1412

Answers (1)

Alex Ott
Alex Ott

Reputation: 87164

Answers are in order:

  1. It's recommended to point to at least to 2 nodes per DC
  2. Seed/contact node is used only for initial bootstrap - when your program reaches any of listed nodes, it "learns" the topology of whole cluster, and then driver listens for nodes status change, and adjust a list of available hosts. So even if seed node(s) goes down after connection is already established, driver will able to reach other nodes
  3. it's harder to maintain usually - you need to keep a configuration parameters for your driver & list of nodes in sync.
  4. When you have RF > 1, Cassandra may read or write data from/to any replica. Consistency level regulates how many nodes should return answer for read or write operation. When you add the new node, the data is redistributed to new node, and if you have correctly selected partition key, then new node start to receive requests in parallel to old nodes
  5. Partition key is responsible for selection of replica(s) that will hold data associated with it - you can see it as a shard. But you need to be careful with selection of partition key - it's easy to create too big partitions, or partitions that will be "hot" (receiving most of operations in cluster - for example, if you're using the date as partition key, and always writing reading data for today).

P.S. I would recommend to read DataStax Architecture guide - it contains a lot of information about Cassandra as well...

Upvotes: 3

Related Questions