Reputation: 2135
I can not find anywhere in the Kafka documentation how a consumer would consume the data from multiple partitions, especially how it decides which partition to pick from next and whether this poll guarantees any "fairness" for each partition.
Where in the documentation I could find a confirmation of the observed (and desired for my uses case) behaviour, where all partitions are consumed evenly and fairly regardless of a lag?
Here is the example:
When a consumer is faster than a producer, ie. total lag for the topic for the consumer is 0, there is no question and a problem of fairness.
However, when a consumer can not process as fast as the producer produces, and the lag builds up, kafka would have, for example, 1000000000 messages in the 1st partition and 1000000 messages in other 9 partitions.
Now, let's say the consumer processes 10000 messages/minute, receiving batches of 100 messages each. What rate of consumed messages will the consumer observe for each partition? Is there any guarantee that "on average" the consumer would observe 1000 messages / minute for each partition? Is there any prioritisation for partitions with a larger lag? Is there any de-prioritisation for partitions with a smaller lag? From my experience, I can see the smaller partitions do not suffer from larger partitions, but I can not find a guarantee of this behaviour in Kafka documentation. Could you please point me where it is documented?
Upvotes: 1
Views: 986
Reputation: 97
Each poll, consumers will get as many as they can from each partition.
"We pull as many records as possible from each partition in a round-robin fashion. In the same example, we would first try to pull all records from A. If there was still space available, we would then pull whatever was left from B and so on. we'd keep track of which partition we left off at so that the next iteration would begin there"
stackoverflow.com/questions/tagged/apache-kafka
In your ideally set example, the largest partition should have lag built up, while the others are all consumed.
Here is a similar question Are Kafka partitions consumed evenly?
Upvotes: 0
Reputation: 1486
There is no out-of-the-box way to handle such prioritization from the consumer side.
Is there any guarantee that "on average" the consumer would observe 1000 messages
This is totally depends on the consumer processing time per message.
Is there any prioritisation for partitions with a larger lag?
No, there is no straightforward way to do this with the consumer configuration, however there are alternatives that could help to mitigate such big lag, a couple of these options are:
1. Multiple-topics approach: Split your priority level for the messages by topics, for example your partition with the high lag, can be split into another topic, which will give you control on more paralellism which will eventually help in controlling the throughput.
2. Bucket Priority approach:: split your topic partitions into logical regions, for example, the 10 partitions topics, it could be split into logical regions from [p0-p5] are dedicated to the high priority messages, from [p6-p8] servicing medium priority and so on, and to control such partitioning you need to implement a custom partitioner on the producer side, and it could go as far as you would imagine, in case you have a KeyMessaged logic, you can decorate it with the bucket-priority approach.
Here is an article about the the Bucket priority pattern by confluent
Upvotes: -1