Reputation: 536
We're using spring-cloud-stream-binder-kafka (3.0.3.RELEASE) to send messages to our Kafka cluster (2.4.1). Every now and then one of the producer threads receives NOT_LEADER_FOR_PARTITION exceptions, and even exceeds the retries (currently set at 12, activated by dependency spring-retry). We've restricted the retries because we're sending about 1k msg/s (per producer instance) and were worried about the size of the buffer. This way we're regularly loosing messages, which is bad for downstream consumers, because we can't simply reproduce the incoming traffic.
The error message is
[Producer clientId=producer-5] Received invalid metadata error in produce request on partition topic-21 due to org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Going to request metadata update now
[Producer clientId=producer-5] Got error produce response with correlation id 974706 on topic-partition topic-21, retrying (8 attempts left). Error: NOT_LEADER_FOR_PARTITION
[Producer clientId=producer-5] Got error produce response with correlation id 974707 on topic-partition topic-21, retrying (1 attempts left). Error: NOT_LEADER_FOR_PARTITION
Any known way to avoid this? Should we go back to the default of MAX_INT retries? Why does it keep sending to the same broker, even though it responded with NOT_LEADER_FOR_PARTITION?
Any hints are welcome.
EDIT: We just noticed that the broker metric kafka_network_requestmetrics_responsequeuetimems goes up around that time, but the max we've seen is around 2.5s
Upvotes: 6
Views: 27355
Reputation: 29
For window system , I was facing issue like this :
023-10-28 14:39:32,522] WARN [Producer clientId=console-producer] Got error produce response with correlation id 6 on topic-partition topicdemo-0, retrying (2 attempts left). Error: NOT_LEADER_OR_FOLLOWER (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-28 14:39:32,524] WARN [Producer clientId=console-producer] Received invalid metadata error in produce request on partition topicdemo-0 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now (org.apache.kafka.clients.producer.internals.Sender)
So, what I did was to locate tmp folder , which contain kafka and zookeaper and to delete it and rerun the command :
.\bin\windows\kafka-topics.bat --create --topic topicdemok --bootstrap-server localhost:9092
NOTE : This is how I solved there might be some other more effective way. Do let me know if anyone figure it out. thanks
Upvotes: -1
Reputation: 401
My solve (On IOs) was to
first kill zookeeper and Kafka servers and any clients. So get Kafka quiet.
cd /tmp rm -rf zookeeper Kafka-logs
Then restart Zookeeper and then Kafka.
I would expect that on Linux it is the same, and on Windows you're have to find the directory where Kafka-logs and zookeeper state files are stored.
Upvotes: 0
Reputation: 4783
you need config listeners properly
I'm using docker-compose like
services:
zookeeper:
container_name: zookeeper
ports:
- "2181:2181"
...
broker-1:
hostname: "broker-1.mydomain.com"
ports:
- "29091:29091"
...
broker-2:
hostname: "broker-2.mydomain.com"
container_name: broker-2
ports:
- "29092:29092"
edit server.properties for each broker
broker-1
listeners: PRIVATE_HOSTNAME://broker-1.mydomain.com:9092,PUBLIC_HOSTNAME://broker-1.mydomain.com:29091
advertised.listeners: PRIVATE_HOSTNAME://broker-1.mydomain.com:9092,PUBLIC_HOSTNAME://broker-1.mydomain.com:29091
listener.security.protocol.map: PUBLIC_HOSTNAME:PLAINTEXT,PRIVATE_HOSTNAME:PLAINTEXT
inter.broker.listener.name: PRIVATE_HOSTNAME
broker-2
listeners: PRIVATE_HOSTNAME://broker-2.mydomain.com:9092, PUBLIC_HOSTNAME://broker-2.mydomain.com:29092
advertised.listeners: PRIVATE_HOSTNAME://broker-2.mydomain.com:9092, PUBLIC_HOSTNAME://broker-2.mydomain.com:29092
listener.security.protocol.map: PUBLIC_HOSTNAME:PLAINTEXT, PRIVATE_HOSTNAME:PLAINTEXT
inter.broker.listener.name: PRIVATE_HOSTNAME
IMPORTANT: note that I'm using the same hostname
for private and public net, because the consumer/producer can only access to kafka by register name like this:
[BrokerToControllerChannelManager broker=1 name=forwarding]: Recorded new controller, from now on will use broker broker-1.mydomain.com:9092
...
[BrokerToControllerChannelManager broker=2 name=forwarding]: Recorded new controller, from now on will use broker broker-2.mydomain.com:9092
edit your host /etc/hosts
127.0.0.1 broker-1.mydomain.com
127.0.0.1 broker-2.mydomain.com
Upvotes: 1
Reputation: 2578
Both Produce and Fetch requests are send to the leader replica of the partition. NotLeaderForPartitionException the exception is thrown when the request is sent to the partition which not the leader replica of the partition now.
The client maintains the information regarding the leader of each partition as a cache. The complete process of cache management is shown below.
The client needs to refresh this information by setting the metadata.max.age.ms
in producer configuration. The default value of this tag is 300000 ms
You can go through the following Apache Kafka documentation.
https://kafka.apache.org/documentation/
Please go through the Sender.java code.
You will find both the error messages in the sender code. The default value of metadata.max.age.ms
is 3 seconds. I think you should reduce this value and then observe the behavior.
Upvotes: 8