Reputation: 73
I'm working on saving consumer status in Kafka rebalance. I found that the pseudo-code in the javadoc ConsumerRebalanceListener using an external storage to save offset.
I want to know that: is there any benefit for system safety or robustness (don't care about business logic requirement) when using external storage instead of Kafka (Zookeeper or _consumer_offset
) to manage offset, e.g. better handling for network issue? Thanks.
Upvotes: 2
Views: 3689
Reputation: 1541
One potential benefit is restoring from backup in a clean and clear way:
If your app stores state, and updates the state in the same transaction as it stores the offset of the originating event, restoring a database snapshot or backup can be effectiveley a complete and consistent time travel for the whole app. It can keep on processing the feed from its last known offset (from the backup), just as it would after a regular restart, and the stored state will remain consistent with the feed. It will never know some of the events are replays.
It all depends on the overall design and value chain, of course. Downstream dependencies might not respond well to the replays, especially if the results depend on more than just the incoming events, and end up different than in the first pass.
Upvotes: 0
Reputation: 26885
The main use case for storing offsets outside of Kafka is when the consuming application needs to store the offsets and the consumed/processed messages together. This allows to do a single write (hopefully atomic) to a system by bringing both values (offsets and messages) together.
Otherwise, the application effectively needs to do 2 "writes" to store the messages in an external system and then store the offsets into Kafka.
This is explained with more details in the Storing Offsets Outside Kafka section in the KafkaConsumer javadoc
Upvotes: 1