bda
bda

Reputation: 422

Create or Replace AWS Glue Crawler

Using boto3:

  1. Is it possible to check if AWS Glue Crawler already exists and create it if it doesn't?
  2. If it already exists I need to update it.
  3. How would the crawler create script look like?

Would this be similar to CREATE OR REPLACE TABLE in an RDBMS...

Has anyone done this or has recommendations?

Thank you :) Michael

Upvotes: 1

Views: 4010

Answers (3)

bda
bda

Reputation: 422

I ended up using standard Python exception handling:

#Instantiate the glue client.
glue_client = boto3.client(
    'glue', 
    region_name = 'us-east-1'
)

#Attempt to create and start a glue crawler on PSV table or update and start it if it already exists.
try:
    glue_client.create_crawler(
        Name = 'crawler name',
        Role = 'role to be used by glue to create the crawler',
        DatabaseName = 'database where the crawler should create the table',
        Targets = 
        {
            'S3Targets': 
            [
                {
                    'Path':'full s3 path to the directory that crawler should process'
                }
            ]
        }
    )
    glue_client.start_crawler(
        Name = 'crawler name'
    )
except:
    glue_client.update_crawler(
        Name = 'crawler name',
        Role = 'role to be used by glue to create the crawler',
        DatabaseName = 'database where the crawler should create the table',
        Targets = 
        {
            'S3Targets': 
            [
                {
                    'Path':'full s3 path to the directory that crawler should process'
                }
            ]
        }
    )
    glue_client.start_crawler(
        Name = 'crawler name'
    )

Upvotes: 0

Ilya Kisil
Ilya Kisil

Reputation: 2668

Yes, you can do all of that using boto3, however, there is no single function that can do this all at once. Instead, you would have to make a series of the following API calls:

Each time these function would return response, which you would need to parse/verify/check manually.

AWS is pretty good on their documentation, so definetely check it out. It might seem overwhelming, but at the beggining you might find it easy to simply copy and paste a request systex that they provide in docs and then strip down unnesessary parts etc. Although boto3 is very helpful with for autocompletion/suggestions but there is a project that can help with that mypy_boto3_builder and its predecessors mypy_boto3, boto3_type_annotations.

If something goes wrong, i.e you haven't specified some parameters correcly, their error responses are pretty good and helpful.

Here is an example of how you can list all existing crawlers

import boto3
from pprint import pprint

client = boto3.client('glue')
response = client.list_crawlers()
available_crawlers = response["CrawlerNames"]

for crawler_name in available_crawlers:
    response = client.get_crawler(Name=crawler_name)
    pprint(response)

Assuming that in IAM you have AWSGlueServiceRoleDefault with all required permissions for glue crawler, here is how you can create one:

response = client.create_crawler(
    Name='my-crawler-via-api',
    Role='AWSGlueServiceRoleDefault',
    Description='Crawler generated with Python API',  # optional
    Targets={
        'S3Targets': [
            {
                'Path': 's3://some/path/in/s3/bucket',
            },
        ],
    },
)

Upvotes: 2

Venky S
Venky S

Reputation: 21

As far as I know, there is no API for this. We manually list the crawlers using list_crawlers and iterate through the list to decide whether to add or update the crawlers(update_crawler).

Check out the API @ https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html

Upvotes: 2

Related Questions