VB_
VB_

Reputation: 45682

Kafka partitions reassignment algorithm and reasons

I found Kafka's partitions mechanism awkward and uncomfortable. Kafka doesn't support automatic partitions reassignment functonality that leads to the following:

  1. If you want to add nodes you have to manually execute bin/kafka-reassign-partitions.sh script. You have to manually write out the partition reassignments for each topic in JSON format.
  2. On broker failure, I suppose replicas should be activated without repartition. That can cause hot spotting. Am I right?

Questions:

  1. Are there any architecture/design reason why Kafka didn't/shouldn't have auto partitions reassignment? Is that because it degrades performance?
  2. What is an algorithm of partitions reassignment behind bin/kafka-reassign-partitions.sh? Does Kafka use any optimizations (i.e. consistent hashing) or raw hash-range paritioning?

Upvotes: 4

Views: 4548

Answers (1)

Mickael Maison
Mickael Maison

Reputation: 26865

  1. As data is stored on brokers, if you reassign a partition to another broker, all the data has to be copied.

    In addition, to not lose any guarantees, for the duration of the copy, you have to maintain extra replicas (the old ones and the new ones). Note that there is a KIP in progress to improve that specific behaviour (KIP-435).

    Moving data is extra load on the cluster and obviously can have a significant impact on performance

  2. The default behaviour of kafka-reassign-partitions.sh is extremely naive and I really recommend to craft a reassignment file yourself if you intend to use it in a real environment.

    By default, it will reassign all partitions, basically simulating the creation of all topics with the new brokers. While this balances leaders very well, this results in a ton of data to copy.

    In practice a similar result can be achieved by only moving a very small portions of the partitions thus limiting the data copy and the impact on the cluster.

    If you're not sure how to craft a reassignment file, there are a bunch of tools that can generate and apply reassignments: kafka-kit, cruise-control

Upvotes: 5

Related Questions