Divs
Divs

Reputation: 1618

Kafka __consumer_offsets growing in size

We are using Kafka as a Strictly Ordered Queue and hence a single topic/single partition/single consumer group combo is in use. I should be able to use multiple partition later in future.

My consumer is spring-boot app listener, that produces and consumes from the same topic(s). So the consumer group is fixed and there is always a single consumer.

Kafka version 0.10.1.1

In such scenario the Log file for topic-0 and a few __consumer_offsets_XX grows. In fact __consumer_offsets_XX grows very high, even though it is supposed to be cleared periodically every 60 minutes (by default). The consumer doesn't read all the time but it has auto.commit.enabled=true

By default, log.retention.minutes (default 7 days) > offset.retention.minutes (default 1 day); but in my case, since my consumer group/consumer is fixed and single; it may not make any sense to keep the messages in topic-0 once it is consumed. Shall I make log.retention.minutes as less as 3 days (say)?

Can I make the offset.retention.minutes lower to be able to control the growing size of the __consumer_offsets_XX w/o touching the auto.commit settings?

Upvotes: 2

Views: 10058

Answers (2)

yuranos
yuranos

Reputation: 9685

offsets.retention.minutes and log.retention.XXX properties will impact a physical removal of records/messages/logs only if offset file rolling occurs.

In general, offsets.retention.minutes property dictates that a broker should forget about your consumer if a consumer disappeared for the specified amount of time and it can do that even without removing log files from the disk.

If you set this value to a relatively low number and check your __consumer_offsets topic while there are no active consumers, over time you will notice something like:

    [group,topic,7]::OffsetAndMetadata(offset=7, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1557475923142, expireTimestamp=None)
    [group,topic,8]::OffsetAndMetadata(offset=6, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1557475923142, expireTimestamp=None)
    [group,topic,6]::OffsetAndMetadata(offset=7, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1557475923142, expireTimestamp=None)
    [group,topic,19]::NULL
    [group,topic,5]::NULL
    [group,topic,22]::NULL

Which signifies how event store systems, like Kafka, work in general. They record new events, instead of changing the existing ones.

I am not aware of any Kafka version where topics are deleted/cleaned up every 60 minutes by default and I have a feeling you misinterpreted something from the documentation.

It seems that the way __consumer_offsets are managed is very different from regular topics. The only way to get __consumer_offsets deleted is to force rolling of its files. That, however, doesn't happen same way it does for regular log files. While regular log files(for your data topics) are rolled automatically every time they are deleted, regardless of log.roll. property, __consumer_offsets don't do that. And if they are not rolled and stay at the initial ...00000 segment, they are not deleted at all. So, it seems the way to reduce your __consumer_offsets files is:

  1. Set relatively small log.roll. ;
  2. Manipulate offsets.retention.minutes if you can afford to disconnect your consumers;
  3. Otherwise adjust log.retention.XXX property.

Upvotes: 1

mrnakumar
mrnakumar

Reputation: 657

  1. Changing offset.retention.minutes will not help. This is to free the space used by the offsets for groups which are inactive. Assuming you do not have too many inactive group ids, you don’t need it.

  2. change the log.retention.bytes config for offsets topic and set it to lower value as per what you want. You can change this config using Kafka-config.sh or some other way you are aware of.

Once you limit the topic size, kafka compaction will kick in when topic size reaches the threshold and clean it up for you.

Upvotes: 0

Related Questions