user3679686
user3679686

Reputation: 516

Kafka producers failing when one Kafka Broker goes down

We have a kafka cluster with 4 brokers. We have setup the topic with the configuration replication.factor=3, min.insync.replicas=2

We noticed that whenever a single broker fails, our producers start failing within 60-90 seconds with the below error

org.apache.kafka.common.errors.TimeoutException: Batch containing 19 record(s) expired due to timeout while requesting metadata from brokers for a-13
[ERROR] ERROR Parser:567 - org.apache.kafka.common.errors.TimeoutException: Batch containing 19 record(s) expired due to timeout while requesting metadata from brokers for a-13

We have the below producer configs on the producer side.

acks=all, 
request.timeout.ms=120000
retry.backoff.ms=5000
retries=3
linger.ms=250
max.in.flight.requests.per.connection=2

As per the configuration will the producer take atleast 6 minutes before failing? As request.timeout.ms=2 minutes and retries=3?

We do not have unclean leader election enabled. We are running Kafka 2.0 and the producer client version is 0.10.0.1.

We have the replica.lag.time.max.ms is set to 10s on the brokers. When the issue happened we noticed that the leader re-election happened within 40seconds. So I am confused why the producers are failing almost instantly when one broker goes down.

I can provide more info if required.

Upvotes: 0

Views: 1937

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191983

You set acks=all, and failed to mention which broker is down.

Sounds like the failed broker hosted one of the topic's partitions, and the ack is failing.

Upvotes: 0

Related Questions