Amree
Amree

Reputation: 2890

Connection timed out with JDBC connection from AWS Glue to RDS

I'm trying to connect to my PosgreSQL RDS directly from my AWS Glue script. I've tried connecting using the generated code and it works. But it doesn't work using JDBC type of connection. Here's the code:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglue.dynamicframe import DynamicFrame
import pyspark.sql.functions as F
from pyspark.sql.functions import *

## Initialize
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()

job.commit()

Part of the errors:

An error occurred while calling o74.load. : java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out. at com.amazon.redshift.client.PGClient.connect ....

Extra information:

Thanks in advance and let me know if you need more information.

Upvotes: 2

Views: 4648

Answers (1)

Amree
Amree

Reputation: 2890

I just found out the reason. It's because I didn't specify the port. I don't remember putting the port before. Everything works after that.

df = spark \
     .read \
     .format('jdbc') \
     .option('url', 'jdbc:postgresql://host:5432/database_name') \
     .option('dbtable', "(SELECT * FROM table WHERE name = 'abcd') AS t") \
     .option('user', 'username') \
     .option('password', 'password') \
     .load()

Upvotes: 1

Related Questions