How data is fetched: max.poll.records and the offsets

Question

As per my current understand of Kafka, data is stored and each "store location" is identified by "offset" (numerical value).

I came across this parameter --> max.poll.records

Suppose the value of max.poll.records is 5, what does this mean? Does it mean that it will read total of FIVE "store locations" in one go(for example, will it try to fetch data from offsets 101, 102, 103, 104, 105).

Can anyone help me understand this?

Mickael Maison · Accepted Answer

First, you are correct, each record is assigned an offset in a partition.

The max.poll.records settings (docs) allows to define the maximum number of records that the consumer will return each time your application calls poll(). This is a maximum, it can return that or less.

Note that this does not directly controls how much data is fetched from the cluster as this settings is applied on the client side. This is just to control the number of records return by poll().

In the background, the consumer could have fetched more data, to be ready to be returned next time the application calls poll(). How much data is retrieved by the consumer is determined by fetch.min.bytes, max.partition.fetch.bytes and fetch.max.bytes.

This settings allows you to control the pace of your application as you can only have to process max.poll.records at once even if there's a large number of records available.

How data is fetched: max.poll.records and the offsets

Answers (1)

Related Questions