aseychell
aseychell

Reputation: 1804

Kafka LeaderNotAvailableException after days of running

I have an environment with Kafka 0.8.2.1 with Zookeeper 3.4.6 on Java 8 and was working fine on a Linux (Centos7) environment. After a number of days (around 1 month), Kafka is no longer working with message publishing, including from the console-consumer, resulting in the following exceptions.

 [2015-10-23 10:49:25,016] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,026] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,026] ERROR Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: talBI (kafka.producer.async.DefaultEventHandler) [2015-10-23 10:49:25,138] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,146] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,147] ERROR Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: talBI (kafka.producer.async.DefaultEventHandler) [2015-10-23 10:49:25,256] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,265] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,265] ERROR Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: talBI (kafka.producer.async.DefaultEventHandler) [2015-10-23 10:49:25,377] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,390] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,390] ERROR Failed to collate messages by topic, partition due to: Failed to fetch topic metadata for topic: talBI (kafka.producer.async.DefaultEventHandler) [2015-10-23 10:49:25,500] WARN Error while fetching metadata [{TopicMetadata for topic talBI -> No partition metadata for topic talBI due to kafka.common.LeaderNotAvailableException}] for topic [talBI]: class kafka.common.LeaderNotAvailableException  (kafka.producer.BrokerPartitionInfo) [2015-10-23 10:49:25,501] ERROR Failed to send requests for topics talBI with correlation ids in [0,8] (kafka.producer.async.DefaultEventHandler) [2015-10-23 10:49:25,502] ERROR Error in handling batch of 1 events (kafka.producer.async.ProducerSendThread) kafka.common.FailedToSendMessageException: Failed to send messages after 3 tries.
        at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:90)
        at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105)
        at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:88)
        at kafka.producer.async.ProducerSendThread$$anonfun$processEvents$3.apply(ProducerSendThread.scala:68)
        at scala.collection.immutable.Stream.foreach(Stream.scala:547)
        at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67)
        at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45)

The describe on the topic is resulting in the following and seems normal:

Topic:talBI     PartitionCount:1        ReplicationFactor:1     Configs:
    Topic: talBI    Partition: 0    Leader: 1       Replicas: 1     Isr: 1

I tried running the kafka-preferred replica-election command line tool with the following command but it is still giving the same problems.

kafka-preferred-replica-election --zookeeper md1qacat01.lnx.ix.com:2181/kafka

This is still on the QA environment since we are evaluating using Kafka on production. Any ideas whether I am missing some configuration or what could have happened?

Upvotes: 0

Views: 339

Answers (1)

codejitsu
codejitsu

Reputation: 3182

I have seen such behaviour when your network settings changed.
You can try with setting advertised.host.name="kafka server hostname" in the KAFKA server.properties and try it again.

Upvotes: 1

Related Questions