Reputation: 31895
[1] 2022-01-18 21:56:10,280 ERROR [org.apa.cam.pro.err.DefaultErrorHandler] (Camel (camel-1) thread #9 - KafkaProducer[test]) Failed delivery for (MessageId: 95835510BC9E9B2-0000000000134315 on ExchangeId: 95835510BC9E9B2-0000000000134315). Exhausted after delivery attempt: 1 caught: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
[1]
[1] Message History (complete message history is disabled)
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] RouteId ProcessorId Processor Elapsed (ms)
[1] [route1 ] [route1 ] [from[netty://udp://0.0.0.0:8080?receiveBufferSize=65536&sync=false] ] [ 125320]
[1] ...
[1] [route1 ] [to1 ] [kafka:test?brokers=10.99.155.100:9092&producerBatchSize=0 ] [ 0]
[1]
[1] Stacktrace
[1] ---------------------------------------------------------------------------------------------------------------------------------------
[1] : org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for test-0:121924 ms has passed since batch creation
Here's the flow for my project
1 and 2 are running in one Kubernetes pod and 3 is running in a separate pod.
I have encountered TimeoutException
at the beginning saying like:
org.apache.kafka.common.errors.TimeoutException: Expiring 20 record(s) for test-0:121924 ms has passed since batch creation
I searched online and found a couple of potential solutions Kafka Producer error Expiring 10 record(s) for TOPIC:XXXXXX: 6686 ms has passed since batch creation plus linger time
Based on the suggestion, I have done:
Unfortunately I still encounter the error due to memory is used up.
Does anyone know how to solve it? Thanks!
Upvotes: 1
Views: 2524
Reputation: 1232
There are several things to take into account here.
You are not showing up what your throughput is, you have to take into account that value and if your broker on 10.99.155.100:9092
is able to process such load.
Did you check 10.99.155.100 during the time of the transfer? The fact that Kafka can potentially process hundreds of thousands of messages per second doesn't mean that you can do it on any hardware.
So, having said that, the timeout is the first to come to my mind, but in your case you have 2 minutes and still you are timing out, for me, this sounds more like a problem in your broker and not on your producer.
To understand the issue, basically, you are getting your mouth full faster than you can swallow, by the time push a message the broker is not able to acknowledge on time (in this case, 2 minutes)
What things you can do here:
delivery.timeout.ms
to an acceptable value, I guess you have SLAs
to attach to Increase your retry backoff timer (retry.backoff.ms
)
Do not put the batch size as 0, this will try a live push to the
broker, which in case seems not possible for the load. Make sure your
max.block.ms is set correctly Change to bigger batches (even if this
increases latency), but not too big, you need to sit down, check how
many records you are pushing and allocate them correctly.Now, some rules:
delivery.timeout.ms
must be bigger than the sum of
request.timeout.ms
and linger.ms
All the above are impacted by
the batch.size If you don't have so many rows, but those rows are
huge! then control the max.request.size
So, to summarize, your properties to change are the following:
delivery.timeout.ms
, request.timeout.ms
, linger.mx
, max.request.size
Assuming the hardware is good and also assuming that you are not sending more than you should, those should do the trick
Upvotes: 1