Reputation: 1
I'm new to kafka and preparing use it for production.
What strategies can be used for rebalancing data storage if brokers for a topic's current partitions are running out of disk space, if more brokers can be added to the cluster?
By a simple example, say a topic has 3 partitions at beginning (1 replica to simplify problem), and 3 brokers each stores 1 partition of the topic, and each of these partition takes up 1TB disk space.
How can I add 3 more new broker servers and alter topic's partition amount to 6, and end up with a data rebalance result of each of the 6 partitions takes up 500GB disk space on its broker?
I think this problem is critical for storing large amount of data forever in kafka cluster.
Thanks.
Upvotes: 0
Views: 1096
Reputation: 329
Also, keep in mind that once you create topics, replicas and ISRs will get defined. Where possible, try to choose a replication factor of 3 for resiliency and durability. Having a replication factor of 2 in a 3-node cluster is not helpful in certain sticky situations, where if one (of the 3) brokers goes down, then none of the available or online brokers will join the replica set (to satisfy the replication factor) and move into the ISR. In a situation like this, you will end up with an ISR that is incomplete and worse, end up with a single point of failure.
Note that broker being down if different from expanding or contracting the Kafka cluster.
Upvotes: 0
Reputation: 191681
kafka-reassign-partitions
& kafka-preferred-replica-election
are the built in commands to handle such relocation tasks, as Kafka does not perform it automatically on cluster expansion.
There are vendored alternatives, such as from Confluent and DataDog.
How can I add 3 more new broker servers
See Docs - Expanding your cluster
alter topic's partition amount to 6
Use kafka-topics --alter
and increase partitions (note: this does not relocate existing data to new partitions, or in other words "re-key" the topic)
Upvotes: 1