Reputation: 2590
We set the log retention hours
to 1 hour as the following (previously setting was 72H)
Using the following Kafka command line tool, we set the kafka retention.ms
to 1H
. Our aim is to purge the data that is older then 1H in topic - test_topic
, so we used the following command:
kafka-configs.sh --alter \
--zookeeper localhost:2181 \
--entity-type topics \
--entity-name topic_test \
--add-config retention.ms=3600000
and also
kafka-topics.sh --zookeeper localhost:2181 --alter \
--topic topic_test \
--config retention.ms=3600000
Both commands ran without errors.
But the problem is about Kafka data that is older then 1H and still remains!
Actually no data was removed from the topic topic_test
partitions. We have HDP Kafka cluster version 1.0x and ambari
We do not understand why data on topic - topic_test
still remained? and not decreased even after we run both cli as already described
what is wrong on the following kafka cli?
kafka-configs.sh --alter --zookeeper localhost:2181 --entity-type topics --entity-name topic_test --add-config retention.ms=3600000
kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic_test --config retention.ms=3600000
from the Kafka server.log
we ca see the following
2020-07-28 14:47:27,394] INFO Processing override for entityPath: topics/topic_test with config: Map(retention.bytes -> 2165441552, retention.ms -> 3600000) (kafka.server.DynamicConfigManager)
[2020-07-28 14:47:27,397] WARN retention.ms for topic topic_test is set to 3600000. It is smaller than message.timestamp.difference.max.ms's value 9223372036854775807. This may result in frequent log rolling. (kafka.server.TopicConfigHandler)
reference - https://ronnieroller.com/kafka/cheat-sheet
Upvotes: 6
Views: 7267
Reputation: 18475
The log cleaner will only work on inactive (sometimes also referred to as "old" or "clean") segments. As long as all data fits into the active ("dirty", "unclean") segment where its size is defined by segment.bytes
size limit there will be no cleaning happening.
The configuration cleanup.policy
is described as:
A string that is either "delete" or "compact" or both. This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic.
In addition, the segment.bytes
is:
This configuration controls the segment file size for the log. Retention and cleaning is always done a file at a time so a larger segment size means fewer files but less granular control over retention.
The configuration segment.ms
can also be used to steer the deletion:
This configuration controls the period of time after which Kafka will force the log to roll even if the segment file isn't full to ensure that retention can delete or compact old data.
As it defaults to one week, you might want to reduce it to fit your needs.
Therefore, if you want to set the retention of a topic to e.g. one hour you could set:
cleanup.policy=delete
retention.ms=3600000
segment.ms=3600000
file.delete.delay.ms=1 (The time to wait before deleting a file from the filesystem)
segment.bytes=1024
Note: I am not referring to retention.bytes
. The segment.bytes
is a very different thing as described above. Also, be aware that log.retention.hours
is a cluster-wide configuration. So, if you plan to have different retention times for different topics this will solve it.
Upvotes: 16