Reputation: 159
Kafka broker version: 2.6.2 Kafka Java apache client: 3.0.0
Last week, there were multiple instances where all consumers of a particular consumer group kept dying with the error:
consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
I have 200 partitions and 100 consumers, with an aggregated ingestion rate of 800 events/sec. The handle time for each event is roughly 60ms. max.poll.interval.ms
is set to the default five minutes and so is max.poll.records=500
. I've recorded the metric time_between_poll_max
and it appears to be only 70-80s. So I know for a fact that the consumer processing time was less than max.poll.interval.ms
. So what else would trigger this error? Also, I noticed that poll_idle_ratio_avg
was 0 during the period the consumers were alive which doesn't make a lot of sense to me.
The general pattern is that one consumer gets the timeout error and the group goes into a rebalancing state, which takes forever. During this time, all the other consumers get kicked out of the group due to the poll timeout erro.
I see thousands of the following message in the logs:
[Consumer clientId==Consumer-2, groupId=EventsConsumer] Request joining group due to: group is already rebalancing
So I suspect the actual reason for the entire group dying is this extremely long rebalancing phase. I'm not sure what is causing it though.
Upvotes: 0
Views: 16905
Reputation: 937
Are you setting the parameter session.timeout.ms? The default value is 45 seconds, so increasing above 70-80 seconds should fix the problem. Pay attention because it must be between the group.min.session.timeout.ms and group.max.session.timeout.ms set for the brokers.
Decoupling session.timeout.ms and max.poll.interval.ms is the result of KIP-62:
Upvotes: 0