Reputation: 39
We are using Kafka Streams in our application to process events.
In order to join operation we use both repartition and selectKey methods of Kafka Streams and this cause to create an internal repartition topics with -1 (infinite) retention.
streamsBuilder
.stream(topics.totalConsumption, Consumed.with(StringSerde(), Serdes.TotalConsumption))
.repartition(Repartitioned.numberOfPartitions<String?, TotalConsumption?>(REPARTITION_COUNT).withName("consumption-tracking-r"))
.selectKey ({ _, value -> value.advertId }, Named.`as`("consumption-tracking-sk"))
But unfortunately our Kafka users doesn't have Delete Record permission for security reasons so Kafka Streams cannot purge records when consumed.
After couple of research we understand that we can't disable purge behaviour of Kafka Streams for repartition topics so we decided to set repartition.purge.interval.ms config with very high value (Long.MAX_VALUE) in order to prevent purge behaviour.
config[StreamsConfig.REPARTITION_PURGE_INTERVAL_MS_CONFIG] = Long.MAX_VALUE
We set retention.ms config for repartition topics in order to clean up records.
But i don't know how purge flow works under the hood so i am not sure setting this repartition.purge.interval.ms config with very high value cause to any problem in later for us?
For example, if Kafka Streams create timer for each message or batch and that timer never finished so that cause memory problem in long term etc?
These are just assumptions but as i said i don't know how purge flow works internally.
Upvotes: 1
Views: 30
Reputation: 1370
The main consequence of setting the purge interval to a high value is that the repartition topics will continue to grow in size, but since you've set the retention.ms
config to a lower value than the default (I'm assuming so here) you should be fine.
Upvotes: 2