humility
humility

Reputation: 383

Clarification on the --broker-list option and --bootstrap-server option in Kafka-console producer and consumer respectively

In Kafka-console-producer, the --broker-list takes a list of servers. Does the producer connect to all of them? (or) Does the producer uses the list of servers to connect one of them and if that one fails, switches to the next and so on?

Similarly, in Kafka-console-consumer the --bootstrap-server takes a list of Kafka servers. If there are two Kafka servers, do I need to specify both of them in the --bootstrap-server? I tried myself running the consumer with one server (Kafka-server1) and when I stopped Kafka-server1, it continued to receive data for the topic.

Upvotes: 3

Views: 2825

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191728

They both act/are the same.

If you look at the Kafka source code, you'll see both options lead to the same "bootstrap.servers" configuration property

  def producerProps(config: ProducerConfig): Properties = {
    val props =
      if (config.options.has(config.producerConfigOpt))
        Utils.loadProps(config.options.valueOf(config.producerConfigOpt))
      else new Properties

    props ++= config.extraProducerProps

    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, config.brokerList) // <---- brokerList is passed as BOOTSTRAP_SERVER

Both consumer and producer will connect in a round-robin fashion to the list of provided addresses to create an initial "boostrap" connection to the Kafka Controller, which knows about all available brokers in the cluster at a given time. It is good practice to give at least 3 for high-availability.


If there are two Kafka servers, do I need to specify both of them in the --bootstrap-server?

With regards to having multiple addresses availble to use, in a cloud enviornment, where you might have brokers over availability zones, it is recommended to have at least 2 brokers listed per availability zone, so 6 total for 3 zones.

The address provided for clients could be similified using a load balancer / reverse proxy down to a single kafka.your.network:9092 address, but then you are introducing extra DNS and network hops to figure out the connection for the sake of having a single, well-known, address.

In any case, all available addresses for the brokers will be handed to the clients and then cached locally.

However, it is important to recognize all send/poll requests will only communicate with the singular leader of a TopicPartition, despite how many addresses you give and how many replicas a topic will have.

Upvotes: 4

Related Questions