Reputation: 6340
I'm using Apache Storm for parallel processing. I'd like to detect when the tuple is on its last replay count so that if it fails again then the tuple can be moved to a dead letter queue.
Is there a way to find the replay count from within the Bolt? I'm not able to find such a field within the tuple.
The reason I'm looking for the last replay count is to iron out our topology so that it is more resilient failures caused by bugs and downstream service outages. When the bug/downstream issue has been resolved the tuples can be reprocessed from the dead letter queue. However I'd like to place the tuples on the dead letter queue only on its last and final replay.
Upvotes: 0
Views: 874
Reputation: 62330
There are multiple possible answers to this question:
Do you use low level Java API to define your topology? If yes, see here: Storm: Is it possible to limit the number of replays on fail (Anchoring)?
You can also use transactional topologies. The documentation is here: https://storm.apache.org/documentation/Transactional-topologies.html
Limiting the number of replays implies counting the number of replays and that's a requirement to get this done. However, Storm does not support a dead letter queue or similar natively. You would need to use a reliable external distributed storage system (maybe Kafka) and put the tuple there if the replay count exceed your threshold. And in your spout, you need to check periodically for tuple in this external storage. If they are stored there "long enough" (whatever that means in your application), the spout can try re-processing.
Upvotes: 1