ceb
ceb

Reputation: 39

Is there any side effect of setting repartition.purge.interval.ms config with too high value in order to prevent purge?

We are using Kafka Streams in our application to process events.

In order to join operation we use both repartition and selectKey methods of Kafka Streams and this cause to create an internal repartition topics with -1 (infinite) retention.

streamsBuilder
    .stream(topics.totalConsumption, Consumed.with(StringSerde(), Serdes.TotalConsumption))
    .repartition(Repartitioned.numberOfPartitions<String?, TotalConsumption?>(REPARTITION_COUNT).withName("consumption-tracking-r"))
    .selectKey ({ _, value -> value.advertId }, Named.`as`("consumption-tracking-sk"))

But unfortunately our Kafka users doesn't have Delete Record permission for security reasons so Kafka Streams cannot purge records when consumed.

After couple of research we understand that we can't disable purge behaviour of Kafka Streams for repartition topics so we decided to set repartition.purge.interval.ms config with very high value (Long.MAX_VALUE) in order to prevent purge behaviour.

config[StreamsConfig.REPARTITION_PURGE_INTERVAL_MS_CONFIG] = Long.MAX_VALUE

We set retention.ms config for repartition topics in order to clean up records.

But i don't know how purge flow works under the hood so i am not sure setting this repartition.purge.interval.ms config with very high value cause to any problem in later for us?

For example, if Kafka Streams create timer for each message or batch and that timer never finished so that cause memory problem in long term etc?

These are just assumptions but as i said i don't know how purge flow works internally.

Upvotes: 1

Views: 30

Answers (1)

bbejeck
bbejeck

Reputation: 1370

The main consequence of setting the purge interval to a high value is that the repartition topics will continue to grow in size, but since you've set the retention.ms config to a lower value than the default (I'm assuming so here) you should be fine.

Upvotes: 2

Related Questions