Cherry
Cherry

Reputation: 33544

Is there a way to run aws glue crawler after job is finished?

For example I run ETL and new fields or columns may be added for target table. To detect table changes a crawler should be run but it has only manual or schedule run.

Can crawler be triggered after job is finished?

Upvotes: 13

Views: 5602

Answers (3)

Amjad Tubasi
Amjad Tubasi

Reputation: 382

If you want to update glue data catalog table, you can use the below code in the job write in order update the table while writing the results.

    val dataSink = glueContext
      .getSink(
        connectionType = "s3",
        connectionOptions = JsonOptions(
          Map(
            "pat" -> outputPath,         
            "enableUpdateCatalog" -> true, // this value should be added
            "updateBehavior" -> "UPDATE_IN_DATABASE" // this value should be added
          )
        )
      )
      .withFormat(
        format = "parquet",
        options = JsonOptions(Map("useGlueParquetWriter" -> true)) // this value should be added
      )


dataSink.setCatalogInfo(catalogDatabase = databaseName, catalogTableName = tableName)

dataSink.writeDynamicFrame(frame = DynamicFrame(dataframe, glueContext))

Upvotes: 0

Neil McGuigan
Neil McGuigan

Reputation: 48256

You can, using a trigger, but not in the trigger UI :S

With a Glue Workflow: Add a Trigger to start a job, add a Job, add a Trigger for job success, add a Crawler for what is triggered

enter image description here

Or, using the CLI:

aws glue create-trigger --name myJob-success \
    --type CONDITIONAL \
    --predicate '{"Logical":"ANY","Conditions":[{"JobName":"myJob","LogicalOperator":"EQUALS","State":"SUCCEEDED"}]}' \
    --actions CrawlerName=myCrawler \
    --start-on-creation

or in CloudFormation:

Type: AWS::Glue::Trigger
Properties: 
  Name: job_success
  Type: CONDITIONAL
  Predicate: 
    Logical: ANY
    Conditions:
      - JobName: myJob
        LogicalOperator: EQUALS
        State: SUCCEEDED
  Actions: 
    - CrawlerName:myCrawler

Upvotes: 0

Ashutosh
Ashutosh

Reputation: 377

import boto3
glue_client = boto3.client('glue', region_name='us-east-1')
glue_client.start_crawler(Name='name_of_crawler')

Copy this code snippet at the end of your code.

Upvotes: 13

Related Questions