Reputation: 1807
I have Apache Kafka cluster with retention policy delete and retention period set to 24 hrs. Then I have changed retention period dynamically and set it to 1 minute for some specific topic. But old messages are still there, so I have several questions:
Upvotes: 4
Views: 1652
Reputation: 1727
On each broker the partitions are divided into segment logs. By default a segment will store 1GB of data (log.segment.bytes) of data. In addition, a new log segment is rolled out by default every 7 days (log.roll.hours)
Each broker schedules a cleaner-thread which is responsible for periodically check which segments are eligibled to deletion. By default, the cleaner-thread will run a check every 5 minutes (this can be configured throught the broker config : log.retention.check.interval.ms)
A segment is removable if the most recent message within a log is older than the configured retention period. In addition, the active segment log (the one the broker is currently writing to) can't be deleted
In order to be able to remove a segment log as soon as possible you should configure the log rolling in correlation with you retention period. For example, if your retention period is configured to 24 hours it could be a good id to configured log.roll.hours to 1 hour.
Note that segment deletion can actually happen at different time on each broker as the cleaner threads are scheduled together.
Check specific topic configuration with kafka-configs
script:
Example :
./bin/kafka-configs --describe --zookeeper localhost:2181 --entity-type topics --entity-name __consumer_offsets
Upvotes: 4
Reputation: 262
Retention policy is applied on closed segments only. If you segment is still active then the data in that segment wont be purged until closed and new segment is opened.
Upvotes: 2