Reputation: 1024
We have a working rabbitmq .implementation , due to volume, we are planning to switch to kafka.
I have a doubt at one point.
In rabbitMQ when the consumer consumes the message from the Q the message goes to a different stage , unacked stage. the client/consumer takes some time to process the message, upon successful process, it sends an acknowledgement to the Q and the message gets deleted from the Q. if unsuccessful, after a defined period if the Q doesnt get an acknowledgement, the message is appended at the end of the Q . In this way we dont loose any message.
With my little knowledge in Kafka I understand that if for example message 100 was not successfully processed, the offset was not increased, but it will be increased if message 101 is processed successfully . So I lost the message 100.
Is there a way to guarantee that none of the messages will be lost .
Upvotes: 3
Views: 7539
Reputation: 744
I also faced the same question. If I want to put in a simple way, RabbitMQ keeps a count of each
Kafka doesnt, so u cant have it ready made, you have to implement it urself.
There are options available though, use kmq, performance will become less than 50% , have a look
https://softwaremill.com/kafka-with-selective-acknowledgments-performance/
Upvotes: 2
Reputation: 16824
You should read a little bit about how message consumption in Kafka works. Here's a link to the consumer section of the official Kafka docs: https://kafka.apache.org/documentation/#theconsumer
Basically, in Kafka, messages are only deleted after enough time has passed, and that is configured using the log.retention.hours
, log.retention.minutes
and log.retention.ms
like @Amin has said.
In Kafka, any number of consumers can start consuming messages from any topic, at any moment, regardless of whether other consumers are already consuming from that same topic. Kafka keeps track of where each consumer is, on each topic/partition, using offsets that are stored in Kafka itself. So, if your consumer needs to consume message 100, like you described in your question, you can simply "rewind" to the desired message, and start consuming normally again. It doesn't matter if you had previously consumed it, or if other consumers are reading from that topic or not.
From the official Kafka docs:
A consumer can deliberately rewind back to an old offset and re-consume data. This violates the common contract of a queue, but turns out to be an essential feature for many consumers. For example, if the consumer code has a bug and is discovered after some messages are consumed, the consumer can re-consume those messages once the bug is fixed.
Upvotes: 0
Reputation: 1013
Kafka doesn't remove messages from topics unless it reaches one of the log.retention.bytes
log.retention.hours
log.retention.minutes
log.retention.ms
configs. so if offset increases you don't lose previous messages and you can simply change offset to the position that you want.
Upvotes: 2
Reputation: 6953
Your message offset will not be increased unless you poll for new messages. So you must be concerned about reprocessing your message.
If you want to store the result of your data processing to Kafka cluster, you can use transaction feature of Kafka. This way you can support exactly once delivery. All of your changes will be saved or none of them will be stored.
Another approach is to make your processing scenario idempotent. You will assign a unique ID for each message in Kafka. When you process the message, you store the ID, in a database. After crash, you check your message ID is already processed by looking at the database.
Upvotes: 0