Urbanleg
Urbanleg

Reputation: 6532

How kafka scales out on a long processing duration

Assume I have infinite amount of computing power

  1. I have 1 topic with 10 partitions
  2. I have 1 consumer-group
  3. Each event process takes 1 second
  4. a large amount of events is starting to be produced to the topic

Now, since processing takes a while, and Kafka consumers within a single group are limited to the number of partitions (in this case = 10)

it leads to the situation where the rate of consumption << rate of events production.

How can I leverage my infinite compute in this use case to increase the rate of consumption?

(to my understanding creating more consumers groups will not resolve my problem as each consumer group will have to start from offset = 0)

Upvotes: 0

Views: 116

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191743

Kafka batches records on each poll. By default, that's 500 records, meaning it'll take 500 seconds (over 8 minutes) for the next poll to happen... The default max.poll.interval.ms (time required between polls) is 5 minutes. Therefore, at the very least, you need to increase timeouts, or reduce max.poll.records around 300.

Alternatively, you can push data into a durable processing queue, and don't sequentially iterate over those polled batches. Confluent maintains a parallel consumer project that can help with that.

Unclear how you came up with only 10 partitions, but adding more will distribute the load further and you can add more consumers.

Upvotes: 1

Related Questions