Reputation: 6818
In my team, we manage ETL jobs through Step Functions. As app requirements, we don't want to use Glue Workflows.
Most of our ETL jobs (i.e., step functions) are of the type:
Run Crawler on Data Source -> Execute Glue Job -> Run Crawler on Data Target
Now, I know that I can run .synch
for AWS Glue jobs (ref), but I can't on Glue Crawlers. My question is: how do I make wait a Step Function until Crawler is done?
I thought about two solutions:
Upvotes: 1
Views: 2415
Reputation: 403
You could replace the Glue Crawler in your statemachine with a Lambda function that triggers the Glue Crawler and keeps running until the statemachine is finished. Then you trigger this Lambda function synchronously in your statemachine.
See this other question on how to implement this approach in Python via the boto3 library. You can adapt the code to be used in an AWS Lambda function: Wait until AWS Glue crawler has finished running
Upvotes: 0
Reputation: 6998
You can use EventBridge for that. EventBridge supports an event on Crawler State Change which then can trigger something in Step Functions.
Upvotes: 2