p_mcp
p_mcp

Reputation: 2791

Error Connecting to Redshift from Spark on Databricks

I am trying to connect to Redshift from Spark (running on Databricks)

from pyspark.sql import SQLContext

sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ACCESS_KEY)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", SECRET_KEY)

# IP addresses from Redshift Security Group panel
IP_ADDRESSES_TO_ADD = ["1.2.3.4/32", "5.6.7.8/32"]
PORTS_TO_ADD = ["80", "443"]
PROTOCOLS_TO_ADD = ["tcp"]

# Read data from a query
df = sqlContext.read \
    .format("com.databricks.spark.redshift") \
    .option("url", "jdbc:redshift://XXX.XXX.eu-west-1.redshift.amazonaws.com:5439/REDSHIFT_DB?user=REDSHIFT_USER&password=REDSHIFT_PW&ssl=true&sslfactory=com.amazon.redshift.ssl.NonValidatingFactory") \
    .option("query", "select * FROM REDSHIFT_TABLE LIMIT 10") \
    .option("tempdir", "s3n://path/to/temp/") \
    .load()

However I am getting the following error:

java.sql.SQLException: [Amazon](500150) Error setting/closing connection: Connection timed out.

Am I missing something?

Upvotes: 3

Views: 1662

Answers (2)

Chandan Bhattad
Chandan Bhattad

Reputation: 371

How are you spinning up the databricks cluster nodes? Is it on demand? Every time the cluster terminates, you get a new set of IP addresses (EC2 instances) the next time you start the cluster. So, you need to make sure that the newly assigned IP addresses are whitelisted to access redshift (Inbound rules)

Upvotes: 0

Rishi
Rishi

Reputation: 1349

It looks like a connection error. Please verify whether you are authorized user or not.

To verify this : Run below command :

telnet XXX.XXX.eu-west-1.redshift.amazonaws.com 5439

You should get something like this (If you are authorized user):

Trying <IP address>...
Connected to <Host name>.
Escape character is '^]'.

But if you will get : connection time out, it means you are not authorized user.

Upvotes: 1

Related Questions