VB_
VB_

Reputation: 45702

Broker disk usage after topic deletion

I'm using Apache Kafka. I dump huge dbs into Kafka, where each database's table is a topic.

I cannot delete topic before it's completely consumed. I cannot set time-based retention policy because I don't know when topic will be consumed. I have limitited disk and too much data. I have to write code that will orchestrate by consumption and deletion programmatically. I understand that the problem appear because we're using Kafka for batch processing, but I can't change technology stack.

What is the correct way to delete consumed topic from brokers?

Currently, I'm calling kafka.admin.AdminUtils#deleteTopic. But I can't find clear related documentation. The method signature doesn't contain kafka server URLs. Does that mean that I'm deleting only topic's metadata and broker's disk usage isn't reduced? So when real append-log file deletion happens?

Upvotes: 1

Views: 303

Answers (1)

Simon Clark
Simon Clark

Reputation: 654

Instead of using a time-based retention policy, are you able to use a size-based policy? log.retention.bytes is a per-partition setting that might help you out here.

I'm not sure how you'd want to determine that a topic is fully consumed, but calling deleteTopic against the topic initially marks it for deletion. As soon as there are no consumers/producers connected to the cluster and accessing those topics, and if delete.topic.enable is set to true in your server.properties file, the controller will then delete the topic from the cluster as soon as it is able to do so. This includes purging the data from disk. It can take anywhere between a few seconds and several minutes to do this.

Upvotes: 1

Related Questions