Benoit Hugonnard
Benoit Hugonnard

Reputation: 73

Is it possible to update only part of a Glue Job using AWS CLI?

I am trying to include in my CI/CD development the update of the script_location and only this parameter. AWS is asking me to include the required parameters such as RoleArn. How can I only update the part of the job configuration I want to change ?

This is what I am trying to use

aws glue update-job --job-name <job_name> --job-update Command="{ScriptLocation=s3://<s3_path_to_script>}

This is what happens :

An error occurred (InvalidInputException) when calling the UpdateJob operation: Command name should not be null or empty.

If I add the default Command Name glueetl, this is what happens :

An error occurred (InvalidInputException) when calling the UpdateJob operation: Role should not be null or empty.

Upvotes: 4

Views: 8461

Answers (3)

gkimer
gkimer

Reputation: 61

An easy way to update via CLI a glue-job or a glue-trigger is using --cli-input-json option. In order to use correct json you could use aws glue update-job --generate-cli-skeleton what returns a complete structure to insert your changes.

EX:

{"JobName":"","JobUpdate":{"Description":"","LogUri":"","Role":"","ExecutionProperty":{"MaxConcurrentRuns":0},"Command":{"Name":"","ScriptLocation":"","PythonVersion":""},"DefaultArguments":{"KeyName":""},"NonOverridableArguments":{"KeyName":""},"Connections":{"Connections":[""]},"MaxRetries":0,"AllocatedCapacity":0,"Timeout":0,"MaxCapacity":null,"WorkerType":"G.1X","NumberOfWorkers":0,"SecurityConfiguration":"","NotificationProperty":{"NotifyDelayAfter":0},"GlueVersion":""}}

Well here just fill the name of the job and change the options. After this you have to transform your json into a one-line json and send into the command using ' '

aws glue update-job --cli-input-json '<one-line-json>'

I hope help someone with this problem too.

Ref:

Upvotes: 5

Boris Branson
Boris Branson

Reputation: 31

I don't know whether you've solved this problem, but I managed using this command:

aws glue update-job --job-name <gluejobname> --job-update Role=myRoleNameBB,Command="{Name=<someupdatename>,ScriptLocation=<local_filename.py>}"

You don't need the the ARN of the role, rather the role name. The example above assumes that you have a role with the name myRoleNameBB and it has access to AWS Glue.

Note: I used a local file on my laptop. Also, the "Name" in "Command" part is also compulsory.

When I run it I go this output:

{
    "JobName": "<gluejobname>"
}

Upvotes: 3

arjunj
arjunj

Reputation: 1516

Based on what I have found, there is no way to update just part of the job using the update-job API.

I ran into the same issue and I provided the role to get past this error. The command worked but the update-job API actually resets other parameters to defaults such as Type of application, Job Language,Class, Timeout, Max Capacity, etc.

So if your pre-existing job is a Spark Application in scala, it will fail as AWS defaults to Python Shell and python as job language as part of the update-job API. And this API provides no way to set job Language type to scala and set a main class (required in case of scala). It provides a way to set the application type to Spark application.

If you do not want to specify the Role to the update-job API. One approach is to copy the new script with the same name and same location that your pre-existing ETL job uses and then trigger your ETL using start-job API as part of the CI process.

Second approach is to run your ETL directly and force it to use the latest script in the start-job API call:

aws glue start-job-run --job-name <job-name> --arguments=scriptLocation="<path to your latest script>"

The only caveat with the second approach is when you look in the console the ETL job will still be referencing the old script Location. The above command just forces this run of the job to use the latest script which you can confirm by looking in the History tab on the Glue ETL console.

Upvotes: 2

Related Questions