Reputation: 607
We want all our kafka consumers to receive all the messages from Kafka (from all partitions) so we are generating a unique group.id
on the fly from each machine. If we have any new machine joining in then that new machine will also have new group.id
so this logic works and all our kafka consumers (machines) gets all the data from kafka cluster.
Now my confusion is -
What's the best way with this approach to make sure we are not losing any data?
Upvotes: 0
Views: 729
Reputation: 141
Q. Are there any drawbacks with this approach where we can lose data?
Ans - There is no drawbacks on having multiple kafka consumers. You need to set the settings such that you can don't lose any data. You can refer to this article which describes on the situations when we can lose data and what settings can help us avoid that: https://blog.softwaremill.com/help-kafka-ate-my-data-ae2e5d3e6576
Q. How should we commit offset if we go with this approach? Should we enable auto commit offset to true or should we manually commit once we have process the data? Also is it okay to have all the machines (consumers) to commit offset manually once they have processed the data independently of other machines or that is gonna cause any problem?
Ans - If you don't want to lose data in case of any failures during processing of data and re-reading of data to process again, you should go with manual setting of offsets, as in case of auto commit of offset might lead you to losing some data in case your process died while processing a data partially
Also, please refer to below blogs which elaborates all the situations on how to lose data in kafka: https://jack-vanlightly.com/blog/2018/9/14/how-to-lose-messages-on-a-kafka-cluster-part1 https://jack-vanlightly.com/blog/2018/9/18/how-to-lose-messages-on-a-kafka-cluster-part-2
Upvotes: 1