TJ15
TJ15

Reputation: 72

Redshift Not Connecting to Host via Python Script

I currently have a .csv file in an S3 bucket that I'd like to append to a table in a Redshift database using a Python script. I have a separate file parser and upload to S3 that work just fine.

The code I have for connecting to/copying into the table is below here. I get the following error message:

OperationalError: (psycopg2.OperationalError) could not connect to server: Connection timed out (0x0000274C/10060) Is the server running on host "redshift_cluster_name.unique_here.region.redshift.amazonaws.com" (18.221.51.45) and accepting TCP/IP connections on port 5439?

I can confirm the following:

What should I be fixing to make sure I can connect my file in S3 to Redshift? Thank you all for any help you can provide.

Also I have looked around on Stack Overflow and ServerFault but these seem to either be for MySQL to Redshift or the solutions (like the linked ServerFault CIDR solution) did not work.

Thank you for any help!

DATABASE = "db"
USER = "user"
PASSWORD = "password"
HOST = "redshift_cluster_name.unique_here.region.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "public"
S3_FULL_PATH = 's3://bucket/file.csv'
#ARN_CREDENTIALS = 'arn:aws:iam::aws_id:role/myRedshiftRole'
REGION = 'region'
############ CONNECTING AND CREATING SESSIONS ############
connection_string = f"redshift+psycopg2://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}"
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = f"SET search_path TO {SCHEMA}"
s.execute(SetPath)
###########################################################
############ RUNNING COPY ############
copy_command = f
'''
copy category from '{S3_FULL_PATH}'
credentials 'aws_iam_role={ARN_CREDENTIALS}'
delimiter ',' region '{REGION}';
'''
s.execute(copy_command)
s.commit()
######################################
#################CLOSE SESSION################
s.close()
##############################################

Upvotes: 2

Views: 9816

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269081

Connecting via a Python program would require the same connectivity as connecting from an SQL Client.

I created a new cluster so I could document the process for you.

Here's the steps I took:

  • Created a VPC with CIDR of 10.0.0.0/16. I don't really need to create another VPC, but I want to avoid any problems with prior configurations.
  • Created a Subnet in the VPC with CIDR of 10.0.0.0/24.
  • Created an Internet Gateway and attached it to the VPC.
  • Edited the default Route Table to send 0.0.0.0/0 traffic to the Internet Gateway. (I'm only creating a public subnet, so don't need a route table for private subnet.)
  • Created a Redshift Cluster Subnet Group with the single subnet I created.
  • Launch a 1-node Redshift cluster into the Cluster Subnet Group. Publicly accessible = Yes, default Security Group.
  • Went back to the VPC console to edit the Default Security Group. Added an Inbound rule for Redshift from Anywhere.
  • Waited for the Cluster to become ready.
  • I then used DbVisualizer to login to the database. Success!

The above steps made a publicly-available Redshift cluster and I connected to it from my computer on the Internet.

Upvotes: 11

Related Questions