CelestialSky
CelestialSky

Reputation: 67

Failed to connect to Redshift Database usng Boto3 and Psycopg2

My python codes fail to make a connection to Redshift database utilizing Boto3 api and psycopg2 library

Let me ask about connecting to the db. I do not own it. I'm trying to use Boto3 to get_cluster_credentials. My codes will not connect to the database an I'm thinking it's an authorization problem but I am most likely wrong, not sure. Can you look at the codes I've written and tell me if they look ok or am I missing something which is entirely possible. When I run it the last line prints.

import boto3
import psycopg2

client = boto3.client(service_name='redshift', region_name='us-east-1')

cluster_creds = client.get_cluster_credentials(
    DbUser='#######',
    DbName='##########',
    ClusterIdentifier='########redshift',
    password = clusster_creds['DbPassword'],
    AutoCreate=False)

try:
    conn = psycopg2.connect(
        host = '########-redshift.#####.us-east-1.redshift.amazonaws.com',
        prt = '5439',
        user = cluster_creds['DbUser'],
        database = 'DbName' )

    #Verify connection
    cursor = conn.cursor()
    cursor.execute("SELECT VERSION()")
    results = cursor.fetchone()
    ver=results[0]

    if (ver is None):
        print("Could not find version")
    else:
        print("The version is " + ver)

except:
    print("Failed to open database connection - James")

I'm thinking I should see "The version is ", NOT 'Failed to open database...'

Upvotes: 0

Views: 3161

Answers (3)

CelestialSky
CelestialSky

Reputation: 67

I was able to connect to AWS Redshift via Python, Boto3 and Psycopg2 APIs, get temp user credentials and create table in Redshift. I also copied data from S3 bucket to that Redshift table. I separated my codes for better readability.

I have to thank @demicioglu here on Stack Overflow and Jessica Sharp at the Dallas/Fort Worth Postgres meetup group who provided examples at: https://github.com/sharpDBA/redshift_demo

db.Conn # connection information
host = '#####-redshift.#####.us-east-1.redshift.amazonaws.com'
port = 5439
database = '#####'
db.Cred # Credential information
import boto3
import psycopg2
import sys

DB_NAME = '######'
CLUSTER_IDENTIFIER = '######-redshift'
DB_USER = '#####'

try:
  client = boto3.client('redshift', region_name='us-east-1')

  #get cluster credentials and temp username and password
  cluster_creds = client.get_cluster_credentials(
       DbUser=DB_USER, DbName=DB_NAME, ClusterIdentifier=CLUSTER_IDENTIFIER, AutoCreate=False)
  temp_user = cluster_creds['DbUser']
  temp_pswd = cluster_creds['DbPassword']

   #report any errors
except Exception as ex:
    print("Exception name : " + ex.__class__.__name__)
    print(str(ex))
    print("Failed to open connection to Redshift database")
    sys.exit(1)
db.CreateTbl # Create table in Redshift
import psycopg2
import dbCred
import dbConn
import sys

# Set Redshift cluster connection details
def CreateTable(schema, tblName):
    schema=schema
    tblName=tblName

    try:
        # Get AWS Redshift connection attributes
        dbname = dbConn.database
        host = dbConn.host
        port = dbConn.port

        # Get temporary database credentials
        user = dbCred.temp_user
        password = dbCred.temp_pswd

        # Connect to AWS Redshift database
        connect = psycopg2.connect(database=dbname,
                                   host=host,
                                   port=port,
                                   user=user,
                                   password=password)

        cur = connect.cursor()

        # SQL Query

        cur.execute("CREATE TABLE " + schema + "." + tblName + " "
                    "(vendorid varchar(4), pickup_datetime TIMESTAMP, "
                    "dropoff_datetime TIMESTAMP, store_and_fwd_flag varchar(1), "
                    "ratecode int, pickup_longitude float(4), pickup_latitude float(4),"
                    "dropoff_logitude float(4), dropoff_latitude float(4), "
                    "passenger_count int, trip_distance float(40), fare_amount float(4), "
                    "extra float(4), mta_tax float(4), tip_amount float(4), "
                    "tolls_amount float(4), ehail_fee float(4), improvement_surcharge float(4), "
                    "total_amount float(4), payment_type varchar(4), trip_type varchar(4)) "
                    "DISTSTYLE EVEN SORTKEY (passenger_count, pickup_datetime);")

        connect.commit()

        #report any errors
    except Exception as ex:
        print("Exception name : " + ex.__class__.__name__)
        print(str(ex))
        print("Failed to open connection to Redshift database")
        sys.exit(1)

    #close all connections
    finally:
        cur.close()
        connect.close()

if __name__ == "__main__":
    if (len(sys.argv) != 3):
        print("Usage: " + sys.argv[0] + "<SchemaName>" + "<TableName>")
        sys.exit(1)
    else:
        schema = sys.argv[1]
        tblName = sys.argv[2]

    CreateTable(schema, tblName)

Upvotes: 0

Surya-ra
Surya-ra

Reputation: 11

I was having similar problem. I guess the issue is with the psycopg2 package you are using.

Background

This soution is helpful to you if you are certain that-

  1. You have configured the IAM roles properly.
  2. Your VPC configurations are perfect and yet you can't connect to the RS database.

Solution

So, finally I found the problem- psycopg2 build that I was using was not compatible with AWS lambda function. So, this is what I did-

  • Installed psycopg2 using pip. Make sure you do this step using a linux OS as lambda's execution environment uses Linux.
pip3 install psycopg2 -t ./
  • Created a deployment package for lambda and uploaded it to your lambda.

PS- Using Linux to build the psycopg package is the key.

Upvotes: 1

demircioglu
demircioglu

Reputation: 3465

Try this code, it works for me, you don't need boto3, it's for S3.

import psycopg2

try:
    con = psycopg2.connect(
        dbname="####",
        host="####",
        port="5439",
        user="####",
    password="####",
)

    cursor = con.cursor()
    cursor.execute("SELECT VERSION()")
    results = cursor.fetchone()
    ver=results[0]

    if (ver is None):
        print("Could not find version")
    else:
        print("The version is " + ver)

except:
    print("Failed to open database connection - James")


con.close()

If your cluster is behind a firewall you'll need sshtunnel I can help with that too.

Upvotes: 0

Related Questions