Vijay Kansal
Vijay Kansal

Reputation: 839

Kafka Partitions Reassignment Performance Impact

I have a Kafka production cluster with 5 nodes and about 500 topics. I need to expand my cluster to add 2 new nodes and since, Kafka does not provide automatic data repartitioning, I am looking to run kafka-reassign-partitions.sh shipped along with Kafka distribution to rebalance all my topics in the overall 7 nodes in the cluster now.

Since I already have a large amount of production data in my cluster,

  1. Will running this script block any concurrent writes to my Kafka topics ?
  2. Will running this script slow down my cluster/producers/consumers ?
  3. How can I stop this script while it is in-progress in case my cluster starts misbehaving during this script's execution ?

I am currently using Kafka v0.8.2.0 with multiple producers and multiple consumers.

Upvotes: 6

Views: 4587

Answers (1)

Gwen Shapira
Gwen Shapira

Reputation: 5158

What Kafka-reassign-partitions does is:

  1. Create new replicas on the new brokers as needed
  2. Have them replicate data until they catch up to the leader
  3. Trigger leader elections where needed
  4. Delete replicas where needed

The leader election phase will delay writes (like any leader failover). Consumers / producers may slow down because the extra replication takes disk and network resources (sometimes significant resources) You can't stop this while in progress. I mean, you can delete the relevant node from ZK, but it wasn't really tested and the new replicas created will stick around... I wouldn't try. If you are concerned, I recommend moving a partition at a time.

In 0.10.1.0 (now going to feature freeze), we'll add the capability to throttle the re-assignment work, which will limit the performance impact on producers and consumers.

Upvotes: 8

Related Questions