Joey Yi Zhao
Joey Yi Zhao

Reputation: 42416

How to get a glue crawler event state?

I am following this doc https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/ to setup an auto-trigger on lambda when crawler finishes. The event pattern I set on cloudwatch is:

{
  "detail": {
    "crawlerName": [
      "reddit_movie"
    ],
    "state": [
      "Succeeded"
    ]
  },
  "detail-type": [
    "Glue Crawler State Change"
  ],
  "source": [
    "aws.glue"
  ]
}

And I add a lambda function as target for this rule in cloudwatch.

I manually trigger the crawler but it doesn't trigger the lambda after it finished. From the crawler log I can see:

04:36:28
[6c8450a5-970a-4190-bd2b-829a82d67fdf] INFO : Table redditmovies_bb008c32d0d970f0465f47490123f749 in database video has been updated with new schema

04:36:30
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Finished writing to Catalog

04:37:37
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Crawler has finished running and is in state READY

Does above log mean crawler finished successfully? How do I know why the lambda function is not triggered from crawler?

And how I can debug this issue? which log should i look at?

Upvotes: 0

Views: 4803

Answers (2)

AnkyHe
AnkyHe

Reputation: 76

At first, I follow the link https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/ and it doesn't work. I found it is due to the python script lambda in the link is not correct if you paste it directly. Please have a check of your lambda.

The python lambda copied from link

import boto3
client = boto3.client('glue')

def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')

We need to fix it as below:

import boto3
client = boto3.client('glue')

def lambda_handler(event, context):
  response = client.start_job_run(JobName = 'MyTestJob')

Upvotes: 0

codinnvrends
codinnvrends

Reputation: 264

Following works -

Cloudwatch Event Rule -

{
  "source": [
    "aws.glue"
  ],
  "detail-type": [
    "Glue Crawler State Change"
  ],
  "detail": {
    "state": [
      "Succeeded"
    ]
  }
}

Sample lambda -

def lambda_handler(event, context):
    try:        
        if event and 'detail' in event and event['detail'] and 'crawlerName' in event['detail']:
            crawler_name = event['detail']['crawlerName']
            print('Received event from crawlerName - {0}'.format(crawler_name))

            crawler = glue.get_crawler(Name=crawler_name)
            print('Received crawler from glue - {0}'.format(str(crawler)))

            database = crawler['Crawler']['DatabaseName']
    except Exception as e:
        print('Error handling events from crawler. Details - {0}'.format(e))
        raise e

Here is screenshot - Adding Crawler Cloudwatch event rule

Upvotes: 2

Related Questions