Reputation: 511
Can someone help me understand why we need the deadletter queue mechanism when we have kafka consumer offsets to our rescue . If i receive a message in my kafka consumer , and i can always choose to commit my offsets . So if some failure happens , for example: if consumer shuts down while processing a message , the offset commit ( which happens in code ) is not performed and when consumer comes back up again it reads from the last position . Isn't this enough to keep things simple . Why do we need to enable DLQ and route failed messages to DLQ , is there any added advantage or am i missing something important ? By having a DLQ i have to write code to send messages from DLQ to the main topic thus complicating things
Upvotes: 4
Views: 6215
Reputation: 920
Kafka picks up in batches from source topic for processing, event though you can put the batch size as one, it is never recommended due to poor performance.
So think of a situation where your batch size is 10 and its processing one by one and a false event (yes, of course you can avoid bad data by strict schema definition, but in real-world things can break) in this batch can throw some exception in application logic. So when consumer is back after crash and since there was no commit for current batch, it polls the same set again and this loops continue. So DLQ gives us an option to route the error ones to another topic which really needs to be inspected later than blocking the entire system due to "one" bad message.
Any events in the DLQ needs to be inspected in logs to identify the failure of the event. Mostly these are the edge conditions or a bug in our existing system.
Its not easy to skip an event from a topic even using admin tools in Kafka Broker. Its a very administrative overkill for ops/teams if you really run into a crash loop in higher env. So best option is to have a DLQ always where ever we have an option for it.
Upvotes: 0
Reputation: 1856
When you commit offset N, next time you consume you will start fetching at N+1. You don't have the granularity to commit each message individually.
In other words, if you've got 10 messages in your queue, and N=5 fails to be processed, you are unable to say that you have processed any message after 5 if you want to be able to consume 5 again in the future.
So the problem is when only a few messages fail to be processed, these are the ones that you send to the DLQ.
Upvotes: 4