moldcraft
moldcraft

Reputation: 448

Cassandra column family bigger than nodes drive space

I want to start a cassandra cluster (eg. 3 nodes), and my application has only one column family.

After reading documentation, I understood how cassandra replicates one column family across multiple nodes.

For eg. every node has 2Tb drive space and column family is replicated on every node so every node contains a full copy of it.

What happens if after some years that column family will exceed 2Tb and I do not have posibility to increase drive space?

If I will add more 10 nodes, I want that column family to be splitted into parts and stored on different drives on nodes, so it can increase to infinite size. If I understood correctly, a column family is limited to the smallest drive space in the cluster?

Upvotes: 0

Views: 173

Answers (1)

Richard
Richard

Reputation: 11100

The scenario you describe is only for the case when all data is replicated to all nodes. You configure this by setting replication factor (RF) to be the number of nodes.

However, the RF can be less than the number of nodes and does not need to scale if you add more nodes.

For example, if you today had 3 nodes with RF 3, each node will contain a copy of all the data, as you say. But then if you add 3 more nodes and keep RF at 3, each node will have half the data. You can keep adding more nodes so each node contains a smaller and smaller proportion of the data.

Therefore there is no limit in principle to how big your data can be.

Upvotes: 2

Related Questions