Reputation: 385
I have an kafka cluster (3 machine with 1 zookeeper and 1 broker run on each machine) I am using kafka_exporter to monitoring consumer lag metric, it's work fine in normal case. But, when i kill 1 broker, the Prometheus cannot get metric from http://machine1:9308/metric (kafka_exporter metric endpoint), because it take a long time to get data (1,5m), so it will be timeout. Now, if I restart kafka_exporter I will see some error:
Cannot get leader of topic __consumer_offsets partition 20: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes
When I run the command: kafka-topics.bat --describe --zookeeper machine1:2181,machine2:2181,machine3:2181 --topic __consumer_offsets The result are:
Topic:__consumer_offsets PartitionCount:50 ReplicationFactor:1 Configs:compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets Partition: 0 Leader: -1 Replicas: 1 Isr: 1
Topic: __consumer_offsets Partition: 1 Leader: 2 Replicas: 2 Isr: 2
Topic: __consumer_offsets Partition: 49 Leader: 2 Replicas: 2 Isr: 2
Is this a configuration error? And how can I get the consumer lag in this case? The "Leader: -1" is an error? if I shutdown the machine 1 forever, it's still work fine?
Upvotes: 2
Views: 1670
Reputation: 498
The leader being -1 means that there is no other broker in the cluster that has a copy of the data for the partition.
The problem in your case is that the replication factor for your topic __consumer_offsets is 1, which means that there is only one broker that hosts the data of any partition in the topic. If you lose any one of the brokers, all the partitions on the broker become unavailable resulting in the topic becoming unavailable. So, your kafka_exporter will fail to read from this topic.
The fix to this if you want to continue exporting consumer offsets on a broker loss, is to reconfigure the topic __consumer_offsets to have replication factor more than 1.
Advised Config - Replication factor - 3, min.insync.replicas - 2.
Upvotes: 1