Cassandra partition technique

Question

From my understanding Apache Cassandra partitions each row in a table into a separate partition located in separate nodes. In that case, if we consider a table having millions of records or rows, Cassandra will partition the records to millions of Nodes.

My doubt is "What if adequate nodes are not available to store each record in case of a table with millions of records which is continuously growing?"

Manish Khandelwal · Accepted Answer

Your understanding is wrong. The three main keywords used in your question are partition, rows and node. Now consider how are they defined

Node represents the Cassandra process running on a virtaul machine/baremetal/cloud.

Partition represents a logical entity which helps Cassandra cluster to know on which node requested data resides. Primary key should be unique.

Row represent a record contained within a partition. A partition can contain millions of rows.

Based on your partition key your Cassandra cluster will identify on which node the data will reside. If you have three nodes, then Cassandra will take hash of your partition key and based on that value node will be identified where data will be written. So as you scale, hash numbers will be redistributed (along with them partitions will be distributed).

So even if you millions of records, they can reside in single node if your Cluster has one node and if you multiple nodes, your data will be distributed almost equally among nodes.

Cassandra partition technique

Answers (1)

Related Questions