Reputation: 422
Using boto3:
Would this be similar to CREATE OR REPLACE TABLE
in an RDBMS...
Has anyone done this or has recommendations?
Thank you :) Michael
Upvotes: 1
Views: 4010
Reputation: 422
I ended up using standard Python exception handling:
#Instantiate the glue client.
glue_client = boto3.client(
'glue',
region_name = 'us-east-1'
)
#Attempt to create and start a glue crawler on PSV table or update and start it if it already exists.
try:
glue_client.create_crawler(
Name = 'crawler name',
Role = 'role to be used by glue to create the crawler',
DatabaseName = 'database where the crawler should create the table',
Targets =
{
'S3Targets':
[
{
'Path':'full s3 path to the directory that crawler should process'
}
]
}
)
glue_client.start_crawler(
Name = 'crawler name'
)
except:
glue_client.update_crawler(
Name = 'crawler name',
Role = 'role to be used by glue to create the crawler',
DatabaseName = 'database where the crawler should create the table',
Targets =
{
'S3Targets':
[
{
'Path':'full s3 path to the directory that crawler should process'
}
]
}
)
glue_client.start_crawler(
Name = 'crawler name'
)
Upvotes: 0
Reputation: 2668
Yes, you can do all of that using boto3
, however, there is no single function that can do this all at once. Instead, you would have to make a series of the following API calls:
Each time these function would return response, which you would need to parse/verify/check manually.
AWS is pretty good on their documentation, so definetely check it out. It might seem overwhelming, but at the beggining you might find it easy to simply copy and paste a request systex that they provide in docs and then strip down unnesessary parts etc. Although boto3
is very helpful with for autocompletion/suggestions but there is a project that can help with that mypy_boto3_builder and its predecessors mypy_boto3, boto3_type_annotations.
If something goes wrong, i.e you haven't specified some parameters correcly, their error responses are pretty good and helpful.
Here is an example of how you can list all existing crawlers
import boto3
from pprint import pprint
client = boto3.client('glue')
response = client.list_crawlers()
available_crawlers = response["CrawlerNames"]
for crawler_name in available_crawlers:
response = client.get_crawler(Name=crawler_name)
pprint(response)
Assuming that in IAM you have AWSGlueServiceRoleDefault
with all required permissions for glue crawler, here is how you can create one:
response = client.create_crawler(
Name='my-crawler-via-api',
Role='AWSGlueServiceRoleDefault',
Description='Crawler generated with Python API', # optional
Targets={
'S3Targets': [
{
'Path': 's3://some/path/in/s3/bucket',
},
],
},
)
Upvotes: 2
Reputation: 21
As far as I know, there is no API for this. We manually list the crawlers using list_crawlers and iterate through the list to decide whether to add or update the crawlers(update_crawler).
Check out the API @ https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html
Upvotes: 2