Reputation: 773
I have made the changes in server.properties file in Kafka 0.8.1.1 i.e. added log.cleaner.enable=true
and also enabled cleanup.policy=compact
while creating the topic.
Now when I am testing it, I pushed the following messages to the topic with following (Key, Message).
Now I pushed the 4th message with a same key as an earlier input, but changed the message. Here log compaction should come into picture. And using Kafka tool, I can see all the 4 offsets in the topic. How can I know whether log compaction is working or not? Should the earlier message be deleted, or the log compaction is working fine as the new message has been pushed.
Does it have to do anything with the log.retention.hours
or topic.log.retention.hours
or log.retention.size
configurations? What is the role of these configs in log compaction.
P.S. - I have thoroughly gone through the Apache Documentation, but still it is not clear.
Upvotes: 15
Views: 16069
Reputation: 11
It is a good point to take a look also on log.roll.hours
, which by default is 168 hours. In simple words: even in case you have not so active topic and you are not able to fill the max segment size (by default 1G for normal topics and 100M for offset topic) in a week you will have a closed segment with size below log.segment.bytes
. This segment can be compacted on next turn.
Upvotes: 1
Reputation: 9725
You can do it with kafka-topics CLI.
I'm running it from docker(confluentinc/cp-enterprise-kafka:6.0.0
).
$ docker-compose exec kafka kafka-topics --zookeeper zookeeper:32181 --describe --topic count-colors-output
Topic: count-colors-output PartitionCount: 1 ReplicationFactor: 1 Configs: cleanup.policy=compact,segment.ms=100,min.cleanable.dirty.ratio=0.01,delete.retention.ms=100
Topic: count-colors-output Partition: 0 Leader: 1 Replicas: 1 Isr: 1
but don't get confused if you don't see anything in Config field. It happens if default values were used. So, unless you see cleanup.policy=compact
in the output - the topic is not compacted.
Upvotes: 0
Reputation: 388
In order check a Topics property from CLI you can do it using Kafka-topics cmd :
https://grokbase.com/t/kafka/users/14aev0snbd/command-line-tool-for-topic-metadata
Upvotes: 1
Reputation: 213
even though this question is a few months old, I just came across it doing research for my own question. I had created a minimal example for seeing how compaction works with Java, maybe it is helpful for you too:
https://gist.github.com/anonymous/f78184eaeec3ee82b15182aec24a432a
Furthermore, consulting the documentation, I used the following configuration on a topic level for compaction to kick in as quickly as possible:
min.cleanable.dirty.ratio=0.01
cleanup.policy=compact
segment.ms=100
delete.retention.ms=100
When run, this class shows that compaction works - there is only ever one message with the same key on the topic.
With the appropriate settings, this would be reproducible on command line.
Upvotes: 16
Reputation: 773
Actually, the log compaction is visible only when the number of logs reaches to a very high count eg 1 million. So, if you have that much data, it's good. Otherwise, using configuration changes, you can reduce this limit to say 100 messages, and then you can see that out of the messages with the same keys, only the latest message will be there, the previous one will be deleted. It is better to use log compaction if you have full snapshot of your data everytime, otherwise you may loose the previous logs with the same associated key, which might be useful.
Upvotes: 4