Reputation: 33544
For example I run ETL and new fields or columns may be added for target table. To detect table changes a crawler should be run but it has only manual or schedule run.
Can crawler be triggered after job is finished?
Upvotes: 13
Views: 5602
Reputation: 382
If you want to update glue data catalog table, you can use the below code in the job write in order update the table while writing the results.
val dataSink = glueContext
.getSink(
connectionType = "s3",
connectionOptions = JsonOptions(
Map(
"pat" -> outputPath,
"enableUpdateCatalog" -> true, // this value should be added
"updateBehavior" -> "UPDATE_IN_DATABASE" // this value should be added
)
)
)
.withFormat(
format = "parquet",
options = JsonOptions(Map("useGlueParquetWriter" -> true)) // this value should be added
)
dataSink.setCatalogInfo(catalogDatabase = databaseName, catalogTableName = tableName)
dataSink.writeDynamicFrame(frame = DynamicFrame(dataframe, glueContext))
Upvotes: 0
Reputation: 48256
You can, using a trigger, but not in the trigger UI :S
With a Glue Workflow: Add a Trigger to start a job, add a Job, add a Trigger for job success, add a Crawler for what is triggered
Or, using the CLI:
aws glue create-trigger --name myJob-success \
--type CONDITIONAL \
--predicate '{"Logical":"ANY","Conditions":[{"JobName":"myJob","LogicalOperator":"EQUALS","State":"SUCCEEDED"}]}' \
--actions CrawlerName=myCrawler \
--start-on-creation
or in CloudFormation:
Type: AWS::Glue::Trigger
Properties:
Name: job_success
Type: CONDITIONAL
Predicate:
Logical: ANY
Conditions:
- JobName: myJob
LogicalOperator: EQUALS
State: SUCCEEDED
Actions:
- CrawlerName:myCrawler
Upvotes: 0
Reputation: 377
import boto3
glue_client = boto3.client('glue', region_name='us-east-1')
glue_client.start_crawler(Name='name_of_crawler')
Copy this code snippet at the end of your code.
Upvotes: 13