Reputation: 42416
I am following this doc https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/ to setup an auto-trigger on lambda
when crawler
finishes. The event pattern I set on cloudwatch
is:
{
"detail": {
"crawlerName": [
"reddit_movie"
],
"state": [
"Succeeded"
]
},
"detail-type": [
"Glue Crawler State Change"
],
"source": [
"aws.glue"
]
}
And I add a lambda function as target for this rule in cloudwatch.
I manually trigger the crawler but it doesn't trigger the lambda after it finished. From the crawler log I can see:
04:36:28
[6c8450a5-970a-4190-bd2b-829a82d67fdf] INFO : Table redditmovies_bb008c32d0d970f0465f47490123f749 in database video has been updated with new schema
04:36:30
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Finished writing to Catalog
04:37:37
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Crawler has finished running and is in state READY
Does above log mean crawler finished successfully? How do I know why the lambda function is not triggered from crawler?
And how I can debug this issue? which log should i look at?
Upvotes: 0
Views: 4803
Reputation: 76
At first, I follow the link https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/ and it doesn't work. I found it is due to the python script lambda in the link is not correct if you paste it directly. Please have a check of your lambda.
The python lambda copied from link
import boto3
client = boto3.client('glue')
def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')
We need to fix it as below:
import boto3
client = boto3.client('glue')
def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')
Upvotes: 0
Reputation: 264
Following works -
Cloudwatch Event Rule -
{
"source": [
"aws.glue"
],
"detail-type": [
"Glue Crawler State Change"
],
"detail": {
"state": [
"Succeeded"
]
}
}
Sample lambda -
def lambda_handler(event, context):
try:
if event and 'detail' in event and event['detail'] and 'crawlerName' in event['detail']:
crawler_name = event['detail']['crawlerName']
print('Received event from crawlerName - {0}'.format(crawler_name))
crawler = glue.get_crawler(Name=crawler_name)
print('Received crawler from glue - {0}'.format(str(crawler)))
database = crawler['Crawler']['DatabaseName']
except Exception as e:
print('Error handling events from crawler. Details - {0}'.format(e))
raise e
Upvotes: 2