Reputation: 26567
I need a way to force the compaction of the __consumer_offsets topic. In a test environment I tried to delete the file cleaner-offset-checkpoint and then kafka deleted many segments as you can see below. Is it safe to delete this file in a production environment?
Before removing cleaner-offset-checkpoint:
# ls -la data/kafka/__consumer_offsets-*/*
-rw-r--r--. 1 root root 0 14 giu 17.20 data/kafka/__consumer_offsets-0/00000000000000000000.log
-rw-r--r--. 1 root root 35452 14 giu 17.45 data/kafka/__consumer_offsets-0/00000000000000018880.log
-rw-r--r--. 1 root root 38682 14 giu 17.54 data/kafka/__consumer_offsets-0/00000000000000021214.log
-rw-r--r--. 1 root root 0 14 giu 17.16 data/kafka/__consumer_offsets-10/00000000000000000000.log
-rw-r--r--. 1 root root 0 14 giu 17.20 data/kafka/__consumer_offsets-1/00000000000000000000.log
-rw-r--r--. 1 root root 35452 14 giu 17.37 data/kafka/__consumer_offsets-10/00000000000000018880.log
-rw-r--r--. 1 root root 74650 14 giu 17.55 data/kafka/__consumer_offsets-10/00000000000000021214.log
-rw-r--r--. 1 root root 35452 14 giu 17.44 data/kafka/__consumer_offsets-1/00000000000000018880.log
-rw-r--r--. 1 root root 47674 14 giu 17.54 data/kafka/__consumer_offsets-1/00000000000000021214.log
-rw-r--r--. 1 root root 0 14 giu 17.14 data/kafka/__consumer_offsets-11/00000000000000000000.log
-rw-r--r--. 1 root root 33325 14 giu 17.40 data/kafka/__consumer_offsets-11/00000000000000018781.log
-rw-r--r--. 1 root root 62264 14 giu 17.55 data/kafka/__consumer_offsets-11/00000000000000021119.log
-rw-r--r--. 1 root root 0 14 giu 17.12 data/kafka/__consumer_offsets-12/00000000000000000000.log
-rw-r--r--. 1 root root 35452 14 giu 17.37 data/kafka/__consumer_offsets-12/00000000000000018727.log
-rw-r--r--. 1 root root 74650 14 giu 17.55 data/kafka/__consumer_offsets-12/00000000000000021061.log
-rw-r--r--. 1 root root 0 14 giu 17.18 data/kafka/__consumer_offsets-13/00000000000000000000.log
-rw-r--r--. 1 root root 35452 14 giu 17.41 data/kafka/__consumer_offsets-13/00000000000000018880.log
-rw-r--r--. 1 root root 65658 15 giu 09.52 data/kafka/__consumer_offsets-13/00000000000000021214.log
...
After removing cleaner-offset-checkpoint ( as you can see there are less segments per partition ):
# ls -la data/kafka/__consumer_offsets-*/*
-rw-r--r--. 1 root root 35452 14 giu 17.45 data/kafka/__consumer_offsets-0/00000000000000000000.log
-rw-r--r--. 1 root root 38682 14 giu 17.54 data/kafka/__consumer_offsets-0/00000000000000021214.log
-rw-r--r--. 1 root root 35452 14 giu 17.37 data/kafka/__consumer_offsets-10/00000000000000000000.log
-rw-r--r--. 1 root root 35452 14 giu 17.44 data/kafka/__consumer_offsets-1/00000000000000000000.log
-rw-r--r--. 1 root root 74650 14 giu 17.55 data/kafka/__consumer_offsets-10/00000000000000021214.log
-rw-r--r--. 1 root root 47674 14 giu 17.54 data/kafka/__consumer_offsets-1/00000000000000021214.log
-rw-r--r--. 1 root root 33325 14 giu 17.40 data/kafka/__consumer_offsets-11/00000000000000000000.log
-rw-r--r--. 1 root root 62264 14 giu 17.55 data/kafka/__consumer_offsets-11/00000000000000021119.log
-rw-r--r--. 1 root root 35452 14 giu 17.37 data/kafka/__consumer_offsets-12/00000000000000000000.log
-rw-r--r--. 1 root root 74650 14 giu 17.55 data/kafka/__consumer_offsets-12/00000000000000021061.log
-rw-r--r--. 1 root root 35452 14 giu 17.41 data/kafka/__consumer_offsets-13/00000000000000000000.log
-rw-r--r--. 1 root root 65658 15 giu 09.52 data/kafka/__consumer_offsets-13/00000000000000021214.log
...
Upvotes: 0
Views: 1796
Reputation: 4105
cleaner-offset-checkpoint
is in kafka logs directory. This file keeps the last cleaned offset
of the topic partitions in the broker like below.
<topic name> <partition number> <last cleaned offset>
So when Kafka start compaction, to calculate dirty ratio, number of clean messages and number of dirty messages to compact is decided by reading this offset.
By cleaning this file, All the logs decided to be as dirty because there is no last cleaned offset
in that file. Compaction will run for all topic partition who is leader of those partition. If you need only __consumer_offsets
to be compacted, just clean that records belong to __consumer_offsets
topic partitions.
But in the broker startup, all brokers will be busy, because of log compaction from start. Other than that, there won't be any issue.
You can refer followings for more understanding
Upvotes: 1