seou1
seou1

Reputation: 506

Boto3 Glue in AWS Glue ETL Job

I am running AWS Glue ETL job (Pyspark) where I have created a boto3 client of Glue to start the crawler and do some other PySpark processing. The issue is that the Glue job keeps on running after start_crawler is called. It neither gives any error, nor ends or starts the crawler. My code snippet is below:

import sys
import boto3
import time

glue_client = boto3.client('glue', region_name = 'us-east-1')
crawler_name = 'test_crawler'
    
print('Starting crawler...')
print(crawler_name)
glue_client.start_crawler(Name=crawler_name)

Whereas the same code if I execute in the Python Shell Glue Job, it successfully starts the crawler and the job terminates. What am I doing wrong here or do I need to do something specific w.r.t Glue ETL job?

Edit: My Glue job has a Glue connection attached to it which I am using to connect to RDS. If I remove this, then this code works fine. But I need this connection to be there to connect to RDS. Any help?

Upvotes: 0

Views: 2298

Answers (3)

m_m
m_m

Reputation: 1

From my experience, a Glue job sometimes gets stuck instead of terminating gracefully after an exception is thrown. I suspect that your Glue service role is missing the required permissions to start the crawler. When you run it in your python console you might use a different role, which would explain your observation.

In order to verify that, print the response of the start_crawler request and wrap the call in a try/except block so that you can print the error and shut down the job.

Upvotes: 0

Asad Shah
Asad Shah

Reputation: 11

I was having the same error and moved my ETL jobs to aws glue 3.0, and now boto3 client is working for me. let me know if this doesn't solve your problem

Upvotes: 0

Nico Arbar
Nico Arbar

Reputation: 162

This is not an answer to your question, but just a tip. I don´t think its a good idea to start the crawler in the same job. You don´t have control when the crawler finishes and if it finishes well. What I do is create an AWS Step Function and create workflows, first the glue job and after it finishes, the crawler would be the next step. That way you can control and monitor the process.

Upvotes: -1

Related Questions