rpli
rpli

Reputation: 73

The benefit of saving Kafka offset to an external storage

I'm working on saving consumer status in Kafka rebalance. I found that the pseudo-code in the javadoc ConsumerRebalanceListener using an external storage to save offset.

I want to know that: is there any benefit for system safety or robustness (don't care about business logic requirement) when using external storage instead of Kafka (Zookeeper or _consumer_offset) to manage offset, e.g. better handling for network issue? Thanks.

Upvotes: 2

Views: 3689

Answers (2)

Joachim Lous
Joachim Lous

Reputation: 1541

One potential benefit is restoring from backup in a clean and clear way:

If your app stores state, and updates the state in the same transaction as it stores the offset of the originating event, restoring a database snapshot or backup can be effectiveley a complete and consistent time travel for the whole app. It can keep on processing the feed from its last known offset (from the backup), just as it would after a regular restart, and the stored state will remain consistent with the feed. It will never know some of the events are replays.

It all depends on the overall design and value chain, of course. Downstream dependencies might not respond well to the replays, especially if the results depend on more than just the incoming events, and end up different than in the first pass.

Upvotes: 0

Mickael Maison
Mickael Maison

Reputation: 26885

The main use case for storing offsets outside of Kafka is when the consuming application needs to store the offsets and the consumed/processed messages together. This allows to do a single write (hopefully atomic) to a system by bringing both values (offsets and messages) together.

Otherwise, the application effectively needs to do 2 "writes" to store the messages in an external system and then store the offsets into Kafka.

This is explained with more details in the Storing Offsets Outside Kafka section in the KafkaConsumer javadoc

Upvotes: 1

Related Questions