Reputation: 822
When a Kafka consumer reads from a its assigned partitions, is there any particular logic that is used by the consumer fetcher thread(s) to get the data out of the partitions? For example, does the fetcher thread make any logic/effort to read equally/uniformly out of the assigned partitions? does it fetch more records from the most lagging partition? or it is just a simple round robin-like logic?
Any detailled documentation on the consumer fecthing logic?
Thank you.
Upvotes: 1
Views: 482
Reputation: 9102
The order appears to be non-deterministic. I am quoting from discussion here. Also some more light on this from official documentation of Kafka Consumer here
If a consumer is assigned multiple partitions to fetch data from, it will try to consume from all of them at the same time, effectively giving these partitions the same priority for consumption. However in some cases consumers may want to first focus on fetching from some subset of the assigned partitions at full speed, and only start fetching other partitions when these partitions have few or no data to consume.
One of such cases is stream processing, where processor fetches from two topics and performs the join on these two streams. When one of the topics is long lagging behind the other, the processor would like to pause fetching from the ahead topic in order to get the lagging stream to catch up. Another example is bootstraping upon consumer starting up where there are a lot of history data to catch up, the applications usually want to get the latest data on some of the topics before consider fetching other topics.
Upvotes: 1