justHelloWorld
justHelloWorld

Reputation: 6818

Step Functions - Wait until Glue Crawler is completed

In my team, we manage ETL jobs through Step Functions. As app requirements, we don't want to use Glue Workflows.

Most of our ETL jobs (i.e., step functions) are of the type:

Run Crawler on Data Source -> Execute Glue Job -> Run Crawler on Data Target 

Now, I know that I can run .synch for AWS Glue jobs (ref), but I can't on Glue Crawlers. My question is: how do I make wait a Step Function until Crawler is done?

I thought about two solutions:

  1. A dedicated Lambda periodically checks Crawler state. This is highly inefficient.
  2. Step function waits for a CloudWatch event about change on Crawler state (i.e., "Succeed" or "Failed". The issue is I don't know how to implement this.

Upvotes: 1

Views: 2415

Answers (2)

DSC
DSC

Reputation: 403

You could replace the Glue Crawler in your statemachine with a Lambda function that triggers the Glue Crawler and keeps running until the statemachine is finished. Then you trigger this Lambda function synchronously in your statemachine.

See this other question on how to implement this approach in Python via the boto3 library. You can adapt the code to be used in an AWS Lambda function: Wait until AWS Glue crawler has finished running

Upvotes: 0

Robert Kossendey
Robert Kossendey

Reputation: 6998

You can use EventBridge for that. EventBridge supports an event on Crawler State Change which then can trigger something in Step Functions.

Upvotes: 2

Related Questions