smishra
smishra

Reputation: 3418

Orchestration of jobs using AWS Step functions using EMR Serverless

Recently Amazon launched EMR Serverless and I want to repurpose my exiting data pipeline orchestration that uses AWS Step Functions: There are steps that create EMR cluster, run some lambda functions, submit Spark Jobs (mostly Scala jobs using spark-submit) and finally terminate the cluster. All these steps are of sync type (arn:aws:states:::elasticmapreduce:addStep.sync)

There are documentation and github samples that describe submitting jobs from orchestration framework such as AirFlow but there is nothing that describes how to use AWS Step Function with EMR Serverless. Any help in this regard is appreciated.

Primarily I am interested in repurposing task step function of type arn:aws:states:::elasticmapreduce:addStep.sync that takes parameters such as ClusterId but in case of EMR Serverless there is no such id.

In summary is there equivalent of Call Amazon EMR with Step Functions for EMR Serverless?

Upvotes: 2

Views: 1706

Answers (1)

khalid jahangeer
khalid jahangeer

Reputation: 98

Currently there is no direct integration of EMR Serverless with Step Functions. However a possible solution is adding a Lambda Layer on top and use the SDK to create emr serverless applications and submit jobs. However you would need an additional lambda to implement a poller that tracks the success of the jobs (in case of interdependent jobs) as it is highly likely that the emr job will outrun the 15 min runtime limitation of the lambda.

Late Add-On : Native integration of step functions with emr-serverless was released in Oct 2023. https://aws.amazon.com/blogs/big-data/orchestrate-amazon-emr-serverless-jobs-with-aws-step-functions/

Upvotes: 2

Related Questions