user3691191
user3691191

Reputation: 597

State machine in AWS (step function?)

I would like to get some advice to see whether step function is suitable for my use case.

I have a bunch of user records generated at random time. I need to do some pre-processing and validation before putting them into a pool. I have a stage which runs periodically (1-5min) to collect records from the pool and combine them, then publish them.

I need realtime traceability/monitor of each record and I need to notify the user once the record is published.

Here is a diagram to illustrate the flow.

enter image description here

Is a step function suitable for my use case? if not, is there any alternative which help me to simplify the solution? Thanks

Upvotes: 0

Views: 1027

Answers (1)

fedonev
fedonev

Reputation: 25669

Yes, Step Functions is an option. Step Function "State Machines" add the greatest value vs other AWS serverless workflow patterns such as event-driven or pub/sub when the scenario involves complex branching/retry logic and observability requirements. SM logic is explicit and visual, which makes it simple to reason about the workflow. For each State Machine (SM) execution, you can easily trace the exact path the execution took and where it failed. This added functionality is reflected in its higher cost.

In any case, you need to gather records until its time to collect them. This batching requirement means that your achitecture will need more elements than just a State Machine. Here are some ideas:

(1) A SM preprocesses Records one-by-one as they arrive

One option is to use State Machines to orchestrate the preprocessing and validation only. Each arriving event record kicks off a SM execution. Pre-processed records go into a queue, from which they are periodically polled and sent to be combined.

[Records EventBrige event] -> [preprocessing SM] -> [Record queue] -> [polling lambda] -> [Combining Service]

(2) Preprocess and process bached records in a end-to-end State Machine

Gather records in a queue as they arrive. A lambda periodically polls the queue and begins the SM execution on a batch of records. A SM Map Task pre-processes and validates the records in parallel then calls the combining service, all within a single execution. This setup gives you the greatest visibility, but is more complex because you have to handle cases where a single record causes the batched execution to fail.

[Records arrive] -> [Record source queue] -> [polling lambda gets batch] -> [SM for preprocessing, collecting and combining]

Other

There are plenty of other combinations, including chaining SM's together if necessary. Or avoiding SM's altogether. Which option is best for you will depend on which pain points matter most to you: observability, error handling, simplicity, cost.

Upvotes: 1

Related Questions