I'm attempting to use AWS Glue to ETL a MySQL database in RDS to S3 so that I can work with the data in services like SageMaker or Athena. At this time, I don't care about transformations, this is a prototype and I simply want to dump the DB to S3 to start testing the various tool chains. I've set up a Glue database and tested the connection to RDS successfully I am using the AWS provide Glue IAM service role My S3 bucket has the correct prefix of aws-glue-* I created a crawler using the Glue database, AWSGlue service role, and S3 bucket above with the options: Schema updates in the data store: Update the table definition in the data catalog Object deletion in the data store: Delete tables and partitions from the data catalog. When I run the crawler, it completes in ~60 seconds but it does not create any tables in the database. I've tried adding the Admin policy to the glue service role to eliminate IAM access issues and the result is the same. Also, CloudWatch logs are empty. Log groups are created for the test connection and the crawler but neither contains any entries. I'm not sure how to further troubleshoot this, info on AWS Glue seems pretty sparse.

Reputation: 1628

Why is my AWS Glue crawler not creating any tables?

I'm attempting to use AWS Glue to ETL a MySQL database in RDS to S3 so that I can work with the data in services like SageMaker or Athena. At this time, I don't care about transformations, this is a prototype and I simply want to dump the DB to S3 to start testing the various tool chains.

I've set up a Glue database and tested the connection to RDS successfully
I am using the AWS provide Glue IAM service role
My S3 bucket has the correct prefix of aws-glue-*
I created a crawler using the Glue database, AWSGlue service role, and S3 bucket above with the options:
- Schema updates in the data store: Update the table definition in the data catalog
- Object deletion in the data store: Delete tables and partitions from the data catalog.

When I run the crawler, it completes in ~60 seconds but it does not create any tables in the database.

I've tried adding the Admin policy to the glue service role to eliminate IAM access issues and the result is the same.

Also, CloudWatch logs are empty. Log groups are created for the test connection and the crawler but neither contains any entries.

I'm not sure how to further troubleshoot this, info on AWS Glue seems pretty sparse.

Upvotes: 2

Answers (2)

Louis Cribbins

Reputation: 189

Ryan Fisher is correct in the sense that it's an error. I wouldn't categorize it as a syntax error. When I ran into this it was because the 'Include path' didn't include the default schema that sql server lovingly provides to you.

I had this: database_name/table_name

When it needed to be: database_name/dbo/table_name

Upvotes: 1

Ryan

Reputation: 1628

Figured it out. I had a syntax error in my "include path" for the crawler. Make sure the connection is the data source (RDS in this case) and the include path lists the data target you want e.g. mydatabase/% (I forgot the /%).

You can substitute the percent (%) character for a schema or table. For databases that support schemas, type MyDatabase/MySchema/% to match all tables in MySchema with MyDatabase. Oracle and MySQL don't support schema in the path, instead type MyDatabase/%. For information about which JDBC data stores support schema, see Cataloging Tables with a Crawler.

Upvotes: 4

Why is my AWS Glue crawler not creating any tables?

Answers (2)

Related Questions