Shubham Chouksey
Shubham Chouksey

Reputation: 7

Kafka Cluster with Multiple Broker and its failure cases

With 3 Kafka Brokers running on localhost:9092,localhost:9093,localhost:9094 respectively, and a zookeeper instance running on localhost:2181.

And a topic is created with a replication factor of 3.

What is the maximum number of broker failures that will not affect the proper functionality and no data loss?

Also when all the three brokers went down one by one. And we start one random broker(of the above three) it gives an Error message - Received invalid metadata error in produce request on partition myTopic due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)

Why it doesn't work with one broker running when we have set the topic partitions as one and replication factor - 3 just for safety?

So it must have worked for two brokers down.

Edit When all the brokers are up and running properly it shows ISR-1,2,3 When I killed the leader of the topic, topic reassignment worked fine and the Broker 1 became the leader, and we get the following response on describing topic

Topic: write-queue-topic        TopicId: x9nnEz-dR4-PxH6hvhcoKQ PartitionCount: 1       ReplicationFactor: 3    Configs: min.insync.replicas=2
        Topic: write-queue-topic        Partition: 0    Leader: 1       Replicas: 3,1,2 Isr: 1,2

And then I killed the leader, i.e. broker 1 , the leader shifetd to the alive broker, i.e broker 2, and --decribe commands gives us

Topic: write-queue-topic        TopicId: x9nnEz-dR4-PxH6hvhcoKQ PartitionCount: 1       ReplicationFactor: 3    Configs: min.insync.replicas=2
        Topic: write-queue-topic        Partition: 0    Leader: 2       Replicas: 3,1,2 Isr: 2

But then I killed the only alive broker, i.e. broker 2, I still get the ISR as 2, and there is broker 2 is not active also

Topic: write-queue-topic        TopicId: x9nnEz-dR4-PxH6hvhcoKQ PartitionCount: 1       ReplicationFactor: 3    Configs: min.insync.replicas=2
        Topic: write-queue-topic        Partition: 0    Leader: none    Replicas: 3,1,2 Isr: 2

On waking up broker 1, it still shows leader as none, but the reassigning of the topic must have worked and broker 1 must be assigned as the leader, instead, it doesn't work until the last active broker(here broker 2) doesn't start.enter image description here

Upvotes: 0

Views: 1443

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191671

Your "error" seems like a debug warning... It's saying the client is requesting new metadata.


Why it doesn't work with one broker running when we have set the topic partitions as one and replication factor - 3

If you have min.insync.replicas=2, and 2 replicas cannot be created, then the request will fail


The maximum is two, (but the real answer is one machine, or disk since everything is local) assuming you have min.insync.replicas=2 on the broker and acks=all on your producer config

From Cloudera Documentation

The following recommendations for Kafka configuration settings make it extremely difficult for data loss to occur.

Producer

  • block.on.buffer.full=true
  • retries=Long.MAX_VALUE
  • acks=all
  • max.in.flight.requests.per.connections=1

Remember to close the producer when it is finished or when there is a long pause.

Broker

  • Topic replication.factor >= 3
  • min.insync.replicas = 2
  • Disable unclean leader election

Upvotes: 2

Related Questions