Reputation: 516
We have a kafka cluster with 4 brokers. We have setup the topic with the configuration
replication.factor=3, min.insync.replicas=2
We noticed that whenever a single broker fails, our producers start failing within 60-90 seconds with the below error
org.apache.kafka.common.errors.TimeoutException: Batch containing 19 record(s) expired due to timeout while requesting metadata from brokers for a-13
[ERROR] ERROR Parser:567 - org.apache.kafka.common.errors.TimeoutException: Batch containing 19 record(s) expired due to timeout while requesting metadata from brokers for a-13
We have the below producer configs on the producer side.
acks=all,
request.timeout.ms=120000
retry.backoff.ms=5000
retries=3
linger.ms=250
max.in.flight.requests.per.connection=2
As per the configuration will the producer take atleast 6 minutes before failing? As request.timeout.ms=2 minutes and retries=3?
We do not have unclean leader election enabled. We are running Kafka 2.0 and the producer client version is 0.10.0.1.
We have the replica.lag.time.max.ms is set to 10s on the brokers. When the issue happened we noticed that the leader re-election happened within 40seconds. So I am confused why the producers are failing almost instantly when one broker goes down.
I can provide more info if required.
Upvotes: 0
Views: 1937
Reputation: 191983
You set acks=all
, and failed to mention which broker is down.
Sounds like the failed broker hosted one of the topic's partitions, and the ack is failing.
Upvotes: 0