zlZimon
zlZimon

Reputation: 2489

How to scale concurrent step function executions and avoid any maxConcurrent exceptions?

Problem: I have a Lambda which produces an array of objects which can have the length of a few thousands (worst case). Each object in this array should be processed by a stepfunction.

I am trying to figure out what the best scalable and error prone solution is so that every object is processed by the stepfunction.

The complete stepfunction does not have a long execution time (under 5 min) but has to wait in some steps for other services to continue the execution (WaitForTaskToken). The stepfunction contains a few short running lambdas.

These are the possibilities I have at the moment:

1. Naive approach: In my head a few thousands or even ten thousands execution concurrent are not a big deal so why can't I just iterate over each element and start an execution directly from the lambda?

2. SQS. Lambda can put each object into SQS and another lambda processes a batch of 10 and starts 10 stepfunction executions. Then I could have some max concurrency of the processing lambda to avoid to many stepfunction executions. But this explains of some issues with such an approach where messages could not be processed, and overall this is alot of overhead I think.

3. Using a Map State: I just could give the array to a mapstate which runs for each object the statemachine with max 40 concurrent iterations. But what if the array is greater than 40? Can I just catch the error and retry with the objects which were not processed in a error catch state so long until all executions are either done or failed. This means if there is one failed execution I still want to have the other 39 executions to run.

4. Split the object in batches and run them parallel: Similar to 3. but instead of just giving all objects to the map state, there is another state which splits the array in 40s and forwards them to the map state and waits until they are finished to process the next batch. So there is one "main" state which runs for a longer time + 40 worker states at the same time.

All of those approaches only take the step function execution concurrency into account but not the lambda concurrencies. Since the stepfunctions uses lambdas there are also alot of concurrent lambdas running. Could this be an issue? And if so, how can I mitigate this?

Upvotes: 1

Views: 2331

Answers (1)

fedonev
fedonev

Reputation: 25669

Inline Map States can handle lots1 of iterations, but only up to 40 concurrently2. Iterations over the MaxConcurrency don't cause an error. They will be invoked with delay.

If your Step Function is only running ~40 concurrent iterations, Lambda concurrency should not be a constraint either.


  1. I just tested a Map state with 1,000 items. Worked just fine. The Quotas page does not mention an upper limit.

  2. In Distributed mode a Map State can handle 10,000 parallel child executions.

Upvotes: 1

Related Questions