Reputation: 506
I am running AWS Glue ETL job (Pyspark) where I have created a boto3 client of Glue to start the crawler and do some other PySpark processing. The issue is that the Glue job keeps on running after start_crawler
is called. It neither gives any error, nor ends or starts the crawler. My code snippet is below:
import sys
import boto3
import time
glue_client = boto3.client('glue', region_name = 'us-east-1')
crawler_name = 'test_crawler'
print('Starting crawler...')
print(crawler_name)
glue_client.start_crawler(Name=crawler_name)
Whereas the same code if I execute in the Python Shell Glue Job, it successfully starts the crawler and the job terminates. What am I doing wrong here or do I need to do something specific w.r.t Glue ETL job?
Edit: My Glue job has a Glue connection attached to it which I am using to connect to RDS. If I remove this, then this code works fine. But I need this connection to be there to connect to RDS. Any help?
Upvotes: 0
Views: 2298
Reputation: 1
From my experience, a Glue job sometimes gets stuck instead of terminating gracefully after an exception is thrown. I suspect that your Glue service role is missing the required permissions to start the crawler. When you run it in your python console you might use a different role, which would explain your observation.
In order to verify that, print the response of the start_crawler request and wrap the call in a try/except block so that you can print the error and shut down the job.
Upvotes: 0
Reputation: 11
I was having the same error and moved my ETL jobs to aws glue 3.0, and now boto3 client is working for me. let me know if this doesn't solve your problem
Upvotes: 0
Reputation: 162
This is not an answer to your question, but just a tip. I don´t think its a good idea to start the crawler in the same job. You don´t have control when the crawler finishes and if it finishes well. What I do is create an AWS Step Function and create workflows, first the glue job and after it finishes, the crawler would be the next step. That way you can control and monitor the process.
Upvotes: -1