Reputation: 551
This is my requirement: I have a crawler and a pyspark job in AWS Glue. I have to setup the workflow using step function.
Questions:
References:
Upvotes: 9
Views: 9321
Reputation: 21
Here is the post of the configuration you need, make sure you added the rest of the configuration as at the end I have used ...
to show that there should be a continuation.
{
"StartAt": "crawler",
"States": {
"crawler_name": {
"Type": "Task",
"Parameters": {
"Name": "crawler"
},
"Resource": "arn:aws:states:::aws-sdk:glue:startCrawler",
"Next": "crawler_info",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2,
"IntervalSeconds": 10,
"MaxAttempts": 2
}
]
},
"crawler_info": {
"Type": "Task",
"Next": "crawler_status",
"Parameters": {
"Name": "crawler"
},
"Resource": "arn:aws:states:::aws-sdk:glue:getCrawler",
"Retry": [
{
"ErrorEquals": [
"States.ALL"
],
"BackoffRate": 2,
"IntervalSeconds": 10,
"MaxAttempts": 3
}
]
},
"crawler_status": {
"Type": "Choice",
"Choices": [
{
"Variable": "$.Crawler.State",
"StringEquals": "FAILED",
"Next": "crawler_failed"
},
{
"Variable": "$.Crawler.State",
"StringEquals": "RUNNING",
"Next": "crawler_finish_wait"
},
{
"Variable": "$.Crawler.State",
"StringEquals": "STOPPING",
"Next": "crawler_finish_wait"
},
{
"Variable": "$.Crawler.State",
"StringEquals": "SUCCESS",
"Next": "glue_job"
}
],
"Default": "glue_job"
},
"crawler_finish_wait": {
"Type": "Wait",
"Seconds": 10,
"Next": "crawler_info"
},
"crawler_failed": {
"Type": "Fail"
},
"glue_job": {
"Type": "Task",
...
}
...
}
to schedule, use a Eventbridge scheduler :)
Upvotes: 2
Reputation: 698
A few months late to answer this but this can be done from within the step function. You can create the following states to achieve it:
TriggerCrawler
: Task State: Triggers a Lambda function, within this lambda function you can write code for triggering AWS Glue Crawler using any of the aws-sdkPollCrawlerStatus
: Task state: Lambda function that polls for Crawler status and returns it as a response of lambda.IsCrawlerRunSuccessful
: Choice State: Based on that status of Glue crawler you can make Next state to be a Choice state which will either go to the next state that triggers yours Glue job (once the Glue crawler state is 'READY') or go to the Wait State
for few seconds before you poll for it again.RunGlueJob
: Task State: A Lambda function that triggers the glue job.WaitForCrawler
: Wait State: That waits for 'n' seconds before you poll for status again.Finish
: Succeed State.Here is how this Step Function will look like:
Upvotes: 5