Zachary
Zachary

Reputation: 1

How to do data rebalance on kafka if data is stored persistently

I'm new to kafka and preparing use it for production.

What strategies can be used for rebalancing data storage if brokers for a topic's current partitions are running out of disk space, if more brokers can be added to the cluster?

By a simple example, say a topic has 3 partitions at beginning (1 replica to simplify problem), and 3 brokers each stores 1 partition of the topic, and each of these partition takes up 1TB disk space.

How can I add 3 more new broker servers and alter topic's partition amount to 6, and end up with a data rebalance result of each of the 6 partitions takes up 500GB disk space on its broker?

I think this problem is critical for storing large amount of data forever in kafka cluster.

Thanks.

Upvotes: 0

Views: 1096

Answers (2)

AtulyaB
AtulyaB

Reputation: 329

Also, keep in mind that once you create topics, replicas and ISRs will get defined. Where possible, try to choose a replication factor of 3 for resiliency and durability. Having a replication factor of 2 in a 3-node cluster is not helpful in certain sticky situations, where if one (of the 3) brokers goes down, then none of the available or online brokers will join the replica set (to satisfy the replication factor) and move into the ISR. In a situation like this, you will end up with an ISR that is incomplete and worse, end up with a single point of failure.

Note that broker being down if different from expanding or contracting the Kafka cluster.

Upvotes: 0

OneCricketeer
OneCricketeer

Reputation: 191681

kafka-reassign-partitions & kafka-preferred-replica-election are the built in commands to handle such relocation tasks, as Kafka does not perform it automatically on cluster expansion.

There are vendored alternatives, such as from Confluent and DataDog.

How can I add 3 more new broker servers

See Docs - Expanding your cluster

alter topic's partition amount to 6

Use kafka-topics --alter and increase partitions (note: this does not relocate existing data to new partitions, or in other words "re-key" the topic)

Upvotes: 1

Related Questions