Reputation: 63
I have a Step Function with a Map to run 5 parallel Glue Jobs with custom arguments, something like this:
"Run Glue Jobs": {
"Type": "Map",
"MaxConcurrency": 5,
"ItemsPath": "$.payload",
"Iterator": {
"StartAt": "Run Generic Glue Job",
"States": {
"Run Generic Glue Job": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "GlueJobName",
"Arguments": {
"--target_bucket": "target-bucket",
"--target_path": "dir1/dir2/",
"--job-language": "python",
"--job-bookmark-option": "job-bookmark-disable",
"--TempDir": "s3://temp-bucket/tmp-glue",
"--continuous-log-logGroup": "gluecloudwatch",
"--enable-continuous-cloudwatch-log": "true",
"--enable-continuous-log-filter": "true",
"--enable-metrics": "",
"--kmskeyid": "arn:aws:kms:region:12345678901:alias/glue-kms-key"
}
},
"End": true
}
}
},
"Next": "Finish"
When I run this stage of the step function, it catches errors, namely: Glue.ConcurrentRunsExceededException Is there a way to pass a parameter somewhere like MaxConcurrentRuns:5 (ExecutionProperty) so I can run up to 5 jobs simultaneously ? I can't find anywhere a way to do that. Only this unrelated resource: https://docs.aws.amazon.com/cli/latest/reference/glue/create-job.html
I can only do that manually Editing the job in the GUI, but I need to create everything from scratch in Terraform so I need to also enable 5 MaxConcurrentRuns from a written source. Any tips? Thank you.
Upvotes: 2
Views: 7033
Reputation: 63
So apparently there's a way to specify max concurrent runs right from Terraform in a resource "aws_glue_job", it's the following:
execution_property {
max_concurrent_runs = 5
}
It's probably solved this way. I'll try to use that.
Upvotes: 1
Reputation: 10393
We can't set Glue Max Concurrent Runs from Step Functions. If Step function Map is run with MaxConcurrency
5, we need to create/update glue job max concurrent runs to minimum 5 as well.
When we create Glue Job from AWS CLI, we can pass MaxConcurrentRuns
as ExecutionProperty.MaxConcurrentRuns
Here is a sample json
{
"Name": "my-glue-job",
"Role": "arn:aws:iam::111122223333:role/glue_etl_service_role",
"ExecutionProperty": {
"MaxConcurrentRuns": 5
},
"Command": {
"Name": "glueetl",
"ScriptLocation": "s3://temp-sandbox/code/scripts/MyGlueScript.scala",
"PythonVersion": "3"
},
"DefaultArguments": {
"--TempDir": "s3://aws-glue-temporary-111122223333-us-east-1/admin",
"--class": "com.mycompany.corp.MainClass",
"--enable-continuous-cloudwatch-log": "true",
"--enable-metrics": "",
"--enable-spark-ui": "true",
"--extra-jars": "s3://temp-sandbox/code/jars/MyExtraJar.jar",
"--job-bookmark-option": "job-bookmark-disable",
"--job-language": "scala",
"--spark-event-logs-path": "s3://aws-glue-assets-111122223333-us-east-1/sparkHistoryLogs/"
},
"MaxRetries": 0,
"Timeout": 2880,
"WorkerType": "G.1X",
"NumberOfWorkers": 10,
"GlueVersion": "2.0"
}
with cli
aws glue create-job --cli-input-json file://myFolder/glue_job_props.json
Upvotes: 2