nihal
nihal

Reputation: 377

How do I determine, at the earliest, if an AWS step function state has failed?

We are implementing a workflow on aws step functions(state machine), that deals with update of user records and possible rollback in case something goes wrong. The state machine does processing in 2 parts:

  1. Part1 - updating
  2. Part2 - rollback

When rollback path is taken by the state machine, the process takes very long. Unacceptable to make the client wait this long. Just before starting the rollback however, the client could be informed. I am trying to figure out a way to achieve this.

I have already tried using describeExecution(). But the fail status changes to FAILED only after the state machine is done executing, which is again very late.

I tried inserting an "SQS send message" step at the point(between part1 and part2) where it is likely to fail. And then polling this queue from the orchestration function(handler of my API endpoint). However, this is not going to work as I may have 100s of requests running in parallel and SQS will eventually fail.

Appreciate an early response.

Cheers.

Upvotes: 0

Views: 850

Answers (1)

Alan Bogu
Alan Bogu

Reputation: 775

First I'd recommend you to read up on error handling in step functions: https://docs.aws.amazon.com/step-functions/latest/dg/concepts-error-handling.html .

You could use fallback states (Task, Map, Parallel) and catch the error by adding Catch field like so:

"Catch": [ {
   "ErrorEquals": [ "java.lang.Exception" ],
   "ResultPath": "$.error-info",
   "Next": "RecoveryState"
}, {
   "ErrorEquals": [ "States.ALL" ],
   "Next": "EndState"
} ]

If you are intending to use API to get the current state of the execution you could use GetExecutionHistory. It will return list of events and you can check the returned array of events for the failures. I.E. taskFailedEventDetails

Upvotes: 0

Related Questions