TEJASWAKUMAR
TEJASWAKUMAR

Reputation: 95

AWS Glue automatic job creation

I have pyspark script which I can run in AWS GLUE. But everytime I am creating job from UI and copying my code to the job .Is there anyway I can automatically create job from my file in s3 bucket. (I have all the library and glue context which will be used while running )

Upvotes: 3

Views: 8189

Answers (4)

Vincent Claes
Vincent Claes

Reputation: 4788

I created an open source library called datajob to deploy and orchestrate glue jobs. You can find it on github https://github.com/vincentclaes/datajob and on pypi

pip install datajob
npm install -g [email protected]

you create a file datajob_stack.py that describes your glue jobs and how they are orchestrated:

from datajob.datajob_stack import DataJobStack
from datajob.glue.glue_job import GlueJob
from datajob.stepfunctions.stepfunctions_workflow import StepfunctionsWorkflow


with DataJobStack(stack_name="data-pipeline-simple") as datajob_stack:

    # here we define 3 glue jobs with a relative path to the source code.
    task1 = GlueJob(
        datajob_stack=datajob_stack,
        name="task1",
        job_path="data_pipeline_simple/task1.py",
    )
    task2 = GlueJob(
        datajob_stack=datajob_stack,
        name="task2",
        job_path="data_pipeline_simple/task2.py",
    )
    task3 = GlueJob(
        datajob_stack=datajob_stack,
        name="task3",
        job_path="data_pipeline_simple/task3.py",
    )

    # we instantiate a step functions workflow and add the sources
    # we want to orchestrate. 
    with StepfunctionsWorkflow(
        datajob_stack=datajob_stack, name="data-pipeline-simple"
    ) as sfn:
        [task1, task2] >> task3

To deploy your code to glue execute:

export AWS_PROFILE=my-profile    
datajob deploy --config datajob_stack.py

any feedback is much appreciated!

Upvotes: 2

Sandeep Fatangare
Sandeep Fatangare

Reputation: 2144

I wrote script which does following:

  1. We have (glue)_dependency.txt file, script gets path of all dependency files and create zip file.
  2. It uploads glue file and zip file in S3 by using s3 sync
  3. Optionally, if any change in job setting will re-deploy cloudformation template

You may write shell script to do it.

Upvotes: 0

Yuriy Bondaruk
Yuriy Bondaruk

Reputation: 4750

Another alternative is to use AWS CloudFormation. You can define all AWS resources you want to create (not only Glue jobs) in a template file and then update stack whenever you need from AWS Console or using cli.

Template for a Glue job would look like this:

  MyJob:
    Type: AWS::Glue::Job
    Properties:
      Command:
        Name: glueetl
        ScriptLocation: "s3://aws-glue-scripts//your-script-file.py"
      DefaultArguments:
        "--job-bookmark-option": "job-bookmark-enable"
      ExecutionProperty:
        MaxConcurrentRuns: 2
      MaxRetries: 0
      Name: cf-job1
      Role: !Ref MyJobRole # reference to a Role resource which is not presented here

Upvotes: 5

Related Questions