Reputation: 1
I used Kafka Version 2.3, I want to delete old kafka logs
there are two folders
log.dirs=/var/www/html/zookeeper_1/zookeeper_data_1
kafka_2.10-0.8.2.2/logs
What is the difference between two folders, and I want to delete old log?
Upvotes: 0
Views: 2672
Reputation: 191844
One is Zookeeper data, the other is Kafka 0.8.2.2 data, which is not directly compatible with Kafka 2.3
You'd delete segments from the latter, however it'll have the potential to corrupt the topic if you do so, so you should let Kafka clean itself up
Upvotes: 0
Reputation: 39860
I would argue that the safest way to delete older logs is to properly configure your retention policy.
In Kafka, there are two types of log retention; size and time retention. The former is triggered by log.retention.bytes
while the latter by log.retention.hours
.
Assuming that you want a delete
cleanup policy, you'd need to configure the following parameters to
log.cleaner.enable=true
log.cleanup.policy=delete
Then you need to think about the configuration of log.retention.bytes
, log.segment.bytes
and log.retention.check.interval.ms
. To do so, you have to take into consideration the following factors:
log.retention.bytes
is a minimum guarantee for a single partition of a topic, meaning that if you set log.retention.bytes
to 512MB, it means you will always have 512MB of data (per partition) in your disk.
Again, if you set log.retention.bytes
to 512MB and log.retention.check.interval.ms
to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered.
A topic log on disk, is made up of segments. The segment size is dependent to log.segment.bytes
parameter. For log.retention.bytes=1GB
and log.segment.bytes=512MB
, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to).
Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. I would also advice to set a time retention policy as well and configure log.retention.hours
accordingly. If after 2 days you don't need your data anymore, then set log.retention.hours=48
.
Upvotes: 2