Reputation: 61
I'm trying to create a Glue Job that enumerates all tables in a database in my catalog. In order to do so I use the following code snippet:
session = boto3.Session(region_name='us-east-2')
glue = session.client('glue')
tables = glue.get_tables(
DatabaseName='customer1'
)
print tables
The job hangs for about 15 minutes and the connection appears to be refused, because I eventually get the following error:
botocore.vendored.requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='glue.us-east-2.amazonaws.com', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(, 'Connection to glue.us-east-2.amazonaws.com timed out. (connect timeout=60)’))
This issue is specific to the glue API. I can use the S3 API with no problems.
I've gone through all my security groups and opened up all the ports to traffic from anywhere. I've even added self-referencing rules. But this is to no avail.
I can't figure out what could be causing the connection to be blocked. Is AWS specifically blocking glue requests?
Upvotes: 6
Views: 3562
Reputation: 41
glue job times out when calling aws boto3 client api
Solution: Just repeat what @darius matonas replied to make it straight, when you need to run a Glue job to get either the job you just created or other jobs' information, BEFORE you call boto3 -- something like get_job_run or get_job_runs, MAKE SURE create a new endpoint in VPC and assigne to same Subnet and Security Group that your Glue connection uses.
Upvotes: 1
Reputation: 5114
I was facing the same problem that boto3 calls to glue
or s3
were hanging and eventually timing out.
I fixed it by changing the subnet-id when creating the dev-endpoint. Initially I was using a subnet that routed traffic to an Internet Gateway. I switched to a subnet routing traffic to an internal NAT gateway. Hope this helps.
Upvotes: 1