welcomeboredom
welcomeboredom

Reputation: 635

Need to run kafka-storage.sh on every broker and controller

We've been running Kafka with Zookeeper for years. The setup was to form Zookeeper cluster and then let brokers connect to it. Everything in Puppet. No extra steps and after some restarts cluster eventually got up.

Now we're migrating to Kraft (in-place) and I noticed I need to format the storage on every controller using kafka-storage.sh utility. I thought it's only part of Kraft migration process but apparently this is needed even when new Kraft clusters are built from scratch. As per https://kafka.apache.org/documentation/#quickstart

I guess I need to swallow my 'why they couldn't just automate it as part of boot up process' but I have some follow up questions I can't find answers to:

  1. Do I need to generate the UUID once and then format the storage on each Kraft controller node separately using same UUID?
  2. Do I need to do the same on every broker or that is somewhat automated?
  3. When it comes to Zookeeper -> Kraft migration I need to do it only on new controllers?: https://docs.confluent.io/platform/current/installation/migrate-zk-kraft.html#step-3-format-storage-with-the-id-you-saved-previously

Upvotes: 0

Views: 372

Answers (2)

welcomeboredom
welcomeboredom

Reputation: 635

Since I went through whole Kraft migration process the answers are:

  1. Same UUID must be used on all the brokers and controllers and kafka-storage.sh must be run separately on each when building new cluster
  2. Nothing is automated. kafka-storage.sh must be run on every kraft controller + every new broker. It doesn't have to be run on brokers that are being migrated to Kraft (because they have storage already formatted).
  3. In Zookeeper->Kraft migration kafka-storage.sh is run only on new Kraft Controllers

Upvotes: 0

fh8510
fh8510

Reputation: 16

I've been looking for this information as well.

The best resource I have found that explains these steps in the context of a multi-node cluster is from Confluent:

https://docs.confluent.io/platform/current/installation/installing_cp/deb-ubuntu.html#ak-in-kraft-mode

Quoting relevant text from above doc to answer your questions...:

  1. Yes, generate one ID and use the same ID on all cluster nodes.

you must create a unique cluster ID and format the log directories with that ID...Before you start Kafka, you must use the kafka-storage tool with the random-uuid command to generate a cluster ID for each new cluster. You only need one cluster ID, which you will use to format each node in the cluster.

  1. You need to run the storage tool script on each node utilizing the same cluster ID:
    bin/kafka-storage.sh format -t <CLUSTER_ID> -c <path-to-server.properties>

use the cluster ID to format storage for each node in the cluster with the kafka-storage tool

  1. I'm not sure about your 3rd question regarding migration.

But with regards to your other meta-question: 'why they couldn't just automate it as part of boot up process'...

Previously, Kafka would format blank storage directories automatically and generate a new cluster ID automatically. One reason for the change is that auto-formatting can sometimes obscure an error condition. This is particularly important for the metadata log maintained by the controller and broker servers. If a majority of the controllers were able to start with an empty log directory, a leader might be able to be elected with missing committed data.

Upvotes: 0

Related Questions