Reputation: 536

Kafka producer fails to send messages with NOT_LEADER_FOR_PARTITION exception

We're using spring-cloud-stream-binder-kafka (3.0.3.RELEASE) to send messages to our Kafka cluster (2.4.1). Every now and then one of the producer threads receives NOT_LEADER_FOR_PARTITION exceptions, and even exceeds the retries (currently set at 12, activated by dependency spring-retry). We've restricted the retries because we're sending about 1k msg/s (per producer instance) and were worried about the size of the buffer. This way we're regularly loosing messages, which is bad for downstream consumers, because we can't simply reproduce the incoming traffic.

The error message is


[Producer clientId=producer-5] Received invalid metadata error in produce request on partition topic-21 due to org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Going to request metadata update now
[Producer clientId=producer-5] Got error produce response with correlation id 974706 on topic-partition topic-21, retrying (8 attempts left). Error: NOT_LEADER_FOR_PARTITION
[Producer clientId=producer-5] Got error produce response with correlation id 974707 on topic-partition topic-21, retrying (1 attempts left). Error: NOT_LEADER_FOR_PARTITION

Any known way to avoid this? Should we go back to the default of MAX_INT retries? Why does it keep sending to the same broker, even though it responded with NOT_LEADER_FOR_PARTITION?

Any hints are welcome.

EDIT: We just noticed that the broker metric kafka_network_requestmetrics_responsequeuetimems goes up around that time, but the max we've seen is around 2.5s

Upvotes: 6

Answers (4)

Kartikeya Tiwari

Reputation: 29

For window system , I was facing issue like this :

023-10-28 14:39:32,522] WARN [Producer clientId=console-producer] Got error produce response with correlation id 6 on topic-partition topicdemo-0, retrying (2 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-28 14:39:32,524] WARN [Producer clientId=console-producer] Received invalid metadata error in produce request on partition topicdemo-0 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)

So, what I did was to locate tmp folder , which contain kafka and zookeaper and to delete it and rerun the command :

.\bin\windows\kafka-topics.bat --create --topic topicdemok --bootstrap-server localhost:9092

NOTE : This is how I solved there might be some other more effective way. Do let me know if anyone figure it out. thanks

Upvotes: -1

Richard Keene

Reputation: 401

My solve (On IOs) was to

first kill zookeeper and Kafka servers and any clients. So get Kafka quiet.

cd /tmp rm -rf zookeeper Kafka-logs

Then restart Zookeeper and then Kafka.

I would expect that on Linux it is the same, and on Windows you're have to find the directory where Kafka-logs and zookeeper state files are stored.

Upvotes: 0

Adán Escobar

Reputation: 4783

you need config listeners properly

I'm using docker-compose like

services:
  zookeeper:
    container_name: zookeeper
    ports:
      - "2181:2181"
    ...
  broker-1:
    hostname: "broker-1.mydomain.com"
    ports:
      - "29091:29091"
    ...
  broker-2:
    hostname: "broker-2.mydomain.com"
    container_name: broker-2
    ports:
      - "29092:29092"

edit server.properties for each broker

broker-1

listeners: PRIVATE_HOSTNAME://broker-1.mydomain.com:9092,PUBLIC_HOSTNAME://broker-1.mydomain.com:29091
advertised.listeners: PRIVATE_HOSTNAME://broker-1.mydomain.com:9092,PUBLIC_HOSTNAME://broker-1.mydomain.com:29091
listener.security.protocol.map: PUBLIC_HOSTNAME:PLAINTEXT,PRIVATE_HOSTNAME:PLAINTEXT
inter.broker.listener.name: PRIVATE_HOSTNAME

broker-2

listeners: PRIVATE_HOSTNAME://broker-2.mydomain.com:9092, PUBLIC_HOSTNAME://broker-2.mydomain.com:29092
advertised.listeners: PRIVATE_HOSTNAME://broker-2.mydomain.com:9092, PUBLIC_HOSTNAME://broker-2.mydomain.com:29092
listener.security.protocol.map: PUBLIC_HOSTNAME:PLAINTEXT, PRIVATE_HOSTNAME:PLAINTEXT
inter.broker.listener.name: PRIVATE_HOSTNAME

IMPORTANT: note that I'm using the same hostname for private and public net, because the consumer/producer can only access to kafka by register name like this:

    [BrokerToControllerChannelManager broker=1 name=forwarding]: Recorded new controller, from now on will use broker broker-1.mydomain.com:9092
...
    [BrokerToControllerChannelManager broker=2 name=forwarding]: Recorded new controller, from now on will use broker broker-2.mydomain.com:9092

edit your host /etc/hosts

127.0.0.1   broker-1.mydomain.com
127.0.0.1   broker-2.mydomain.com

Upvotes: 1

Rohit Yadav

Reputation: 2578

Both Produce and Fetch requests are send to the leader replica of the partition. NotLeaderForPartitionException the exception is thrown when the request is sent to the partition which not the leader replica of the partition now.

The client maintains the information regarding the leader of each partition as a cache. The complete process of cache management is shown below.

The client needs to refresh this information by setting the metadata.max.age.ms in producer configuration. The default value of this tag is 300000 ms

You can go through the following Apache Kafka documentation.

https://kafka.apache.org/documentation/

Please go through the Sender.java code.

https://github.com/a0x8o/kafka/blob/master/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

You will find both the error messages in the sender code. The default value of metadata.max.age.ms is 3 seconds. I think you should reduce this value and then observe the behavior.

Upvotes: 8

Kafka producer fails to send messages with NOT_LEADER_FOR_PARTITION exception

Answers (4)

Related Questions