datahack
datahack

Reputation: 669

Running AWS Glue jobs in parallel

I have 30 Glue jobs that I want to run in parallel. If one job fails, others must continue. I started with step function, creating state machine that executes runner lambda function which on other hand triggers glue job depending on parameter(name of glue job). For one job there is decent amount of step function logic implemented(retry, error handling etc.)

Is there any way to execute state machine from other state machine? In that way I can have 30 parallel tasks that executes other state machines. If you have any suggestions please feel free to share.

Upvotes: 1

Views: 7735

Answers (1)

NHol
NHol

Reputation: 2125

AWS recommends using SNS for a fan out architecture to run parallel jobs from a single S3 event, as you get an overlap error if two lambdas try to use the same S3 event.

You basically send the S3 event to SNS and subscribe your 30 lambdas so they all trigger from the SNS notification (containing details of the S3 event) when it's published.

  1. Create the Topic
  2. Update the Topic Policy to allow Event Notifications from an S3 Bucket
  3. Configure the S3 Bucket to send Event Notifications to the SNS Topic
  4. Create the parallel Lambda functions, one for each job
  5. Modify the Lambda functions to process SNS messages of S3 event notifications instead of the S3 event itself

https://aws.amazon.com/blogs/compute/fanout-s3-event-notifications-to-multiple-endpoints/

There is also another nice example with CloudFormation template https://aws.amazon.com/blogs/compute/messaging-fanout-pattern-for-serverless-architectures-using-amazon-sns/

Upvotes: 2

Related Questions