Garet Jax
Garet Jax

Reputation: 1171

Lambda using python 3.6 & boto3 in VPC times out when connecting to Redshift

I am trying to use boto3 in python3.6 to connect to my Redshift cluster using the get_cluster_credentials API. The following code times out 100% of the time when the Lambda function is added to the VPC. It runs without issue when Lambda is not added to the VPC.

I can't figure out if get_cluster_credentials uses the public or private IP to access Redshift. I also can't figure out if there is a way to force it to use one or the other.

import json
import boto3

def lambda_handler(event, context):
    redshiftClient = boto3.client('redshift', region_name='us-east-1')
    cluster_creds = redshiftClient.get_cluster_credentials( DbUser='awsuser',
                                                            DbName='dev',
                                                            ClusterIdentifier='redshift-cluster-1',
                                                            AutoCreate=False)
    print(cluster_creds)

    return {
        'statusCode': 200,
        'body': json.dumps('Hello from Lambda!')
    }

My configuration is very simple. The NACL lets everything (0.0.0.0/0) through on all ports and protocols. MY SG does the same thing.

I have 1 internet gateway defined: igw-0d1e6dcbfdea792b2

I have 1 subnet and 1 routing table in the VPC. The routing table has one rule to map 0.0.0.0/0 --> igw-0d1e6dcbfdea792b2.

I am able to connect from outside AWS to the cluster using SQL Workbench/J without issue.

I have looked at many posts, threads and documents, but cannot figure out what is happening:

AWS Lambda times out connecting to RedShift

Connect Lambda to Redshift in Different Availability Zones

https://github.com/awslabs/aws-lambda-redshift-loader/issues/86

Accessing Redshift from Lambda - Avoiding the 0.0.0.0/0 Security Group

https://aws.amazon.com/blogs/big-data/a-zero-administration-amazon-redshift-database-loader/

Conecting AWS Lambda to Redshift - Times out after 60 seconds

Please help.

Thanks a lot.

Upvotes: 3

Views: 2017

Answers (2)

Matti Lyra
Matti Lyra

Reputation: 13078

You should be able to connect to RedShift directly from the VPC without an Internet or NAT gateway. This is what AWS PrivateLink is for and RedShift is supported.

A generic description of the process (service specific variations apply):

  • Go to VPC -> Endpoints in AWS console
  • Create a new endpoint
  • Select which service you want to create the endpoint for
    • configure endpoint security group etc.

Now, in your code when you create the client, you need to define the region and the endpoint for the client.

Disclaimer: I've not done this for RedShift, but I have done it for STS and it works.

Upvotes: 1

John Rotenstein
John Rotenstein

Reputation: 269081

As per your other question, when an AWS Lambda function is added to a VPC, it does not receive a Public IP address. Therefore, if the function wishes to access the Internet (in this case to make the get_cluster_credentials() call), you should:

  • Add a NAT Gateway in a Public subnet
  • Attach the Lambda function to a Private subnet
  • Set routing on the private subnet to use the NAT Gateway for 0.0.0.0/0

It will not work if you have only one subnet, since the Lambda function will not be able to access the NAT Gateway.

I have also had success manually assigning an Elastic IP address to the Lambda function's ENI (instead of using a NAT Gateway), but this will not scale because Lambda might deploy additional containers and therefore additional ENIs. It might be sufficient if the function runs rarely and never concurrently.

Upvotes: 4

Related Questions