MonkeyWithDarts
MonkeyWithDarts

Reputation: 775

Programmatically Re-Running SWF Workflows

We have a few thousand SWF workflows that have failed over the past year due to various activity bugs. Because the bugs were long-lived, all activity retries failed and the workflows were closed. I want to re-run all of those failed workflows, picking up at the activity that was last executed (and failed). A basic workflow retrigger.

The SWF console has a Re-Run command, but it only lets you select twenty-five workflows at a time, far fewer than the thousands I need.

I could use the CLI start-workflow-execution command (or analogous API call), but I can't figure out where to get the most recent workflow input the way the Console's 'Re-Run' operation does. I can get the most recent workflow input from get-workflow-execution-history, but that requires that I know the most recent runId and I can't find any way to get that.

To summarize:

  1. The only way I can think to programmatically re-run SWF workflows is: for each failed workflow, magically grab its most recent runId, then grab its most recent workflow input via get-workflow-execution-history, then restart it using that input via start-workflow-execution. Is there a better way?
  2. If the answer to #1 is "There is no better way," then how can I find the most recent runId for a particular workflowId?

(The fact that I can't find any documentation or discussion on such retriggers makes me worry that I am approaching this the wrong way, so I welcome feedback setting me straight.)

UPDATE: Higher level question: What is the right way to handle workflows that terminated due to error conditions that outlasted all retries? The fact that it is so difficult to retrigger SWF workflows makes me think I am misunderstanding the SWF paradigm.

Upvotes: 1

Views: 2146

Answers (2)

Maxim Fateev
Maxim Fateev

Reputation: 6870

  1. It sounds reasonable. Note that re-executing workflow doesn't restart it from the last failed activity but from the beginning (history is empty).
  2. You can use ListClosedWorkflowExecutions to get the most recent runId. Note that it supports workflowId as a filter parameter.

UPDATE: Higher level question: What is the right way to handle workflows that terminated due to error conditions that outlasted all retries?

SWF has everything to retry workflow from the point it failed as the whole workflow execution history is preserved. Unfortunately, AWS Flow Framework out of the box doesn't perform state restoration from the previous run. But it is not an inherent limitation and this feature could be added.

UPDATE 2:

Temporal Workflow an open source platform, which is based on the same high-level ideas as SWF, does support reset feature that allows restarting workflow from any point by creating a new run with a subset of history.

Upvotes: 3

Rohit
Rohit

Reputation: 927

I don't think that you can do it in this manner. The max workflow history retention is 90 days, so even if you go down the path of getting the workflow execution history you will be able to restart failed workflows for last 90 days Also aws has an account level restriction on the number and rate at which you can make swf api calls, so once you start making the calls in loop to get history and start workflow you reach this level too soon and start getting exception. Better way to approach this is to look at the point where the workflow execution was started from and re run the failed executions again by passing in the same input.

Upvotes: 1

Related Questions