Reputation: 121
I'm working on a relatively simple workflow using Amazon's Flow framework for Java. I think I have a decent grasp of everything that's going on right now, but I have one area I'm still uncertain about: how should I go about handling timeouts?
The main timeout with my workflow is the executionStartToCloseTimeoutSeconds on the workflow itself, but I'd imagine the process is the same regardless of which timeout fires. It seems that most of the time, when the task times out, it just kind of disappears. I'd like to be able to know when this happens and do something (e.g. send an e-mail or log it somehow). I searched around and couldn't find any example of anything being notified that a timeout happened.
Upvotes: 1
Views: 3274
Reputation: 6870
Activity timeout is delivered to the workflow code in the form of an Exception and can be easily handled.
IMHO workflow execution timeout is similar to kill -9 in Unix. It kills workflow without giving it chance to perform cleanup. So the main use for it is to ensure that broken workflow instances do not stay open forever.
For all business level timeouts do not rely on workflow timeouts, use timers instead. When timer fires your workflow code can execute notification activity and terminate the workflow with appropriate failure status.
Upvotes: 3
Reputation: 10566
http://docs.aws.amazon.com/amazonswf/latest/developerguide/swf-timeout-types.html
For activity related timeouts, the short answer is that your decider (i.e. workflow) logic should handle it. You should not have to worry about things timing out once you validate the logic and have retries in place.
For workflow timeouts you will need to inspect the workflow history / state to figure out that it timed out. You can definitely list workflow executions but you probably have to go through the SWF API directly (i.e. not through Flow). You want to do this anyway to catch failed workflows.
A pattern I've used and seen being used with SWF is to have an external way of keeping track of the work you've dispatched through SWF (think a DB) and use that to check in on work that was started and never completed. The workflow itself updates this when it completes (or as it's completing major pieces of work) so it's trivial to figure out which workflows are problematic.
Upvotes: 2