n99
n99

Reputation: 421

Kafka log compaction pointers

Reading about log compaction on a topic, I was wondering if there is any way for a consumer to get hold of any of the positions/offsets of the following?

Basically the point at which the compacted and non-compacted parts of the log meet?

I've read that there is a cleaner-offset-checkpoint file that sits on the broker at /var/lib/kafka/data/cleaner-offset-checkpoint but is the info in this file available to a consumer?

My use case is a consumer that will consume compacted keys one way and non-compacted keys another way.

thanks for any advice.

UPDATE:

thinking for example of a topic holding various customer events like here https://www.confluent.io/blog/put-several-event-types-kafka-topic/; new customer, customer updates name, customer updates address, etc. Log compaction, I believe, will leave one event per customer in the tail but still many events per customer in the head (assuming compaction is slower than message production..?) A new consumer of this topic would have to treat all compacted messages as CREATES, but then also treat non-compacted message as their more fine grained event? In any case I was wondering if a consumer could tell how far along a topic compaction has got, at any given time?

Upvotes: 0

Views: 319

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191983

It's not possible, with the consumer api, no.

If you want to check that checkpoint file on disk, you could use Jssh, for example, to access a broker, and read the file. If it has offset data, you could then use seek methods, but keep in mind that the Log Cleaner thread may be actively running when you seek to or consume that data

A new consumer of this topic would have to treat all compacted messages as CREATES, but then also treat non-compacted message as their more fine grained event?

I don't think this is a valid use case. For a stream of customer updates, you'd just update a new customer model in a table via a streaming reduce function. If any consumer restarts, it'll have to always read from the beginning of the topic to rebuild its local state then continue reading any updates to those stored values, so doesn't make sense to skip past them all, or have two separate consumers

I also don't necessarily think you need different models. Some UUID would be unique, and every event can contain the full model of a "customer". Most fields can remain optional/nullable until they are provided with a new message with all those fields set (or not), and this defines a batch update since you can set/update/remove multiple attributes at once. If you need more granularity, that's also possible to define at the producer level by storing and looping over your attributes and producing individual "customer" objects with each new attribute

Upvotes: 1

Related Questions