flamengo
flamengo

Reputation: 13

delete topic-messages in Apache kafka

I'm testing the working of kafka-topics but I don´t undestand how the deletion works.

I have created a simple topic with

retention.ms = 60000

and

segment.ms = 60000

and

cleanup.policy=delete.

After this I created a producer and I sent some messages. A consumer receive the messages without problems. But I expect that, after one minute, if a repeat the consumer, it doesn't show the messages because they must have been deleted. But this behaviour doesn't occur.

If I create a query in ksql it's the same. The messages always appear.

I think that I don't understand how the deletion works.

Example:

1) Topic

./kafka-topics --create --zookeeper localhost:2181 --topic test -- 
  replication-factor 2 --partitions 1 --config "cleanup.policy=delete" -- 
  config "delete.retention.ms=60000" --config "segment.ms=60000"

2) producer

./kafka-avro-console-producer --broker-list broker:29092 --topic test-- 
  property parse.key=true --property key.schema='{"type":"long"}' --property 
  "key.separator=:" --property value.schema='{"type": "record","name": 
  "ppp","namespace": "test.topic","fields": [{"name": "id","type": "long"}]}'

3) messages from producer

1:{"id": 1}
 2:{"id": 2}
 4:{"id": 4}
 5:{"id": 5}

4) Consumer

  ./kafka-avro-console-consumer \
    --bootstrap-server broker:29092 \
    --property schema.registry.url=http://localhost:8081 \
    --topic test--from-beginning --property print.key=true

The consumer shows the four messages.

But I expect that If I run the consumer again after one minute (I have waited more time too, even hours) the messages don´t show because the retention.ms and segment.ms are one minute.

When messages are actually deleted?

Upvotes: 1

Views: 1163

Answers (2)

Aaron_ab
Aaron_ab

Reputation: 3758

Another important think to know in deletion process in Kafka is log segment file:

Topics are divided into partitions right? This is what allows parallelism, scale etc..

Each partition is divided into log segments files. Why? Because Kafka writes data to Disk right...? we don't want to it keep the entire topic / partition in 1 huge file, but split it into smaller files (segments)..

Breaking data into smaller files has many advantages, don't really related to the question. Can read more here

The key thing to notice here is:

Retention policy is looking on the log semgnet's file time stamp.

"Retention by time is performed by examining the last modified time (mtime) on each log segment file on disk. Under normal clus‐ ter operations, this is the time that the log segment was closed, and represents the timestamp of the last message in the file"

(From Kafka-definitive Guide, page 26)

Version 0.10.1.0

The log retention time is no longer based on last modified time of the log segments. Instead it will be based on the largest timestamp of the messages in a log segment.

Which means it looks only on closed log segment files. Make sure your 'segment' config params are right..

Upvotes: 1

Piyush Patel
Piyush Patel

Reputation: 1751

Change the retention.ms as mentioned by Ajay Srivastava above using kafka-topics --zookeeper localhost:2181 --alter --topic test --config retention.ms=60000 and test again.

Upvotes: 0

Related Questions