john
john

Reputation: 2544

AWS Aurora Serverless - Communication Link Failure

I'm using MySQL Aurora Serverless cluster (with the Data API enabled) in my python code and I am getting a communications link failure exception. This usually occurs when the cluster has been dormant for some time.

But, once the cluster is active, I get no error. I have to send 3-4 requests every time before it works fine.

Exception detail:

The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. An error occurred (BadRequestException) when calling the ExecuteStatement operation: Communications link failure

How can I solve this issue? I am using standard boto3 library

Upvotes: 12

Views: 9711

Answers (6)

amason13
amason13

Reputation: 39

in your cloudformation/SAM template add AutoPause = false in the ScalingConfiguration. here's a snippet from mine.

(docs)

 Type: 'AWS::RDS::DBCluster'
 Properties:
   DatabaseName: !Ref DBName
   DBClusterIdentifier: name
   Engine: aurora-mysql
   EngineVersion: 5.7.mysql_aurora.2.11.3
   EngineMode: serverless
   EnableHttpEndpoint: true
   ScalingConfiguration:
     MinCapacity: 1
     MaxCapacity: 64
     AutoPause: false

Upvotes: 0

MKesper
MKesper

Reputation: 509

If your database is available and data-api is enabled, it can also be a consequence of not enough rights to access the database. Relevant AWS documentation: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html#data-api.access

Extended error message in my case: The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.; SQLState: 08S01

Upvotes: 0

FelipeCruzV10
FelipeCruzV10

Reputation: 486

This may be a little late, but there is a way to deactivate the DORMANT behavior of the database.

When creating the Cluster from the CDK, you can configure an attribute as follows:

new rds.ServerlessCluster(
  this,
  'id',
  {
    engine: rds.DatabaseClusterEngine.AURORA_MYSQL,
    defaultDatabaseName: 'name',
    vpc,
    scaling:{
      autoPause:Duration.millis(0) //Set to 0 to disable
    }
  }
)

The attribute is autoPause. The default value is 5 minutes (Communication link failure message may appear after 5 minutes of not using the DB). The max value is 24 hours. However, you can set the value to 0 and this disables the automatic shutdown. After this, the database will not go to sleep even if there are no connections.

When looking at the configuration from AWS (RDS -> Databases -> 'instance' -> Configuration -> Capacity Settings), you'll notice this attribute without a value (if set to 0): AWS attribute

Finally, if you don't want the database to be ON all the time, set your own autoPause value so that it behaves as expected.

Upvotes: 0

Ratul
Ratul

Reputation: 1

I also got this issue, and taking inspiration from the solution used by Arless and the conversation with Jimbo, came up with the following workaround.

I defined a decorator which retries the serverless RDS request until the configurable retry duration expires.

import logging
import functools
from sqlalchemy import exc
import time

logger = logging.getLogger()


def retry_if_db_inactive(max_attempts, initial_interval, backoff_rate):
    """
    Retry the function if the serverless DB is still in the process of 'waking up'.
    The configration retries follows the same concepts as AWS Step Function retries.
    :param max_attempts: The maximum number of retry attempts
    :param initial_interval: The initial duration to wait (in seconds) when the first 'Communications link failure' error is encountered
    :param backoff_rate: The factor to use to multiply the previous interval duration, for the next interval
    :return:
    """

    def decorate_retry_if_db_inactive(func):

        @functools.wraps(func)
        def wrapper_retry_if_inactive(*args, **kwargs):
            interval_secs = initial_interval
            attempt = 0
            while attempt < max_attempts:
                attempt += 1
                try:
                    return func(*args, **kwargs)

                except exc.StatementError as err:
                    if hasattr(err.orig, 'response'):
                        error_code = err.orig.response["Error"]['Code']
                        error_msg = err.orig.response["Error"]['Message']

                        # Aurora serverless is waking up
                        if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
                            logger.info('Sleeping for ' + str(interval_secs) + ' secs, awaiting RDS connection')
                            time.sleep(interval_secs)
                            interval_secs = interval_secs * backoff_rate
                        else:
                            raise err
                    else:
                        raise err

            raise Exception('Waited for RDS Data but still getting error')

        return wrapper_retry_if_inactive

    return decorate_retry_if_db_inactive

which can then be used something like this:

@retry_if_db_inactive(max_attempts=4, initial_interval=10, backoff_rate=2)
def insert_alert_to_db(sqs_alert):
    with db_session_scope() as session:
        # your db code
        session.add(sqs_alert)

    return None

Please note I'm using sqlalchemy, so the code would need tweaking to suit specific purposes, but hopefully will be useful as a starter.

Upvotes: 0

Arless
Arless

Reputation: 462

If it is useful to someone this is how I manage retries while Aurora Serverless wake up.

Client returns a BadRequestException so boto3 will not retry even if you change the config for the client, see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html.

My first option was to try with Waiters but RDSData does not have any waiter, then I tried to create a custom Waiter with an Error matcher but only tries to match error code, ignoring message, and because a BadRequestException can be raised by an error in a sql statement I needed to validate message too, so I using a kind of waiter function:

def _wait_for_serverless():
    delay = 5
    max_attempts = 10

    attempt = 0
    while attempt < max_attempts:
        attempt += 1

        try:
            rds_data.execute_statement(
                database=DB_NAME,
                resourceArn=CLUSTER_ARN,
                secretArn=SECRET_ARN,
                sql_statement='SELECT * FROM dummy'
            )
            return
        except ClientError as ce:
            error_code = ce.response.get("Error").get('Code')
            error_msg = ce.response.get("Error").get('Message')

            # Aurora serverless is waking up
            if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
                logger.info('Sleeping ' + str(delay) + ' secs, waiting RDS connection')
                time.sleep(delay)
            else:
                raise ce

    raise Exception('Waited for RDS Data but still getting error')

and I use it in this way:

def begin_rds_transaction():
    _wait_for_serverless()

    return rds_data.begin_transaction(
        database=DB_NAME,
        resourceArn=CLUSTER_ARN,
        secretArn=SECRET_ARN
    )

Upvotes: 8

john
john

Reputation: 2544

Here is the reply from AWS Premium Business Support.

Summary: It is an expected behavior

Detailed Answer:

I can see that you receive this error when your Aurora Serverless instance is inactive and you stop receiving it once your instance is active and accepting connection. Please note that this is an expected behavior. In general, Aurora Serverless works differently than Provisioned Aurora , In Aurora Serverless, while the cluster is "dormant" it has no compute resources assigned to it and when a db. connection is received, Compute resources are assigned. Because of this behavior, you will have to "wake up" the clusters and it may take a few minutes for the first connection to succeed as you have seen.

In order to avoid that you may consider increasing the timeout on the client side. Also, if you have enabled Pause, you may consider disabling it [2]. After disabling Pause, you can also adjust the minimum Aurora capacity unit to higher value to make sure that your Cluster always having enough computing resource to serve the new connections [3]. Please note that adjusting the minimum ACU might increase the cost of service [4].

Also note that Aurora Serverless is only recommend for certain workloads [5]. If your workload is highly predictable and your application needs to access the DB on a regular basis, I would recommend you use Provisioned Aurora cluster/instance to insure high availability of your business.

[2] How Aurora Serverless Works - Automatic Pause and Resume for Aurora Serverless - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.how-it-works.html#aurora-serverless.how-it-works.pause-resume

[3] Setting the Capacity of an Aurora Serverless DB Cluster - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.setting-capacity.html

[4] Aurora Serverless Price https://aws.amazon.com/rds/aurora/serverless/

[5] Using Amazon Aurora Serverless - Use Cases for Aurora Serverless - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html#aurora-serverless.use-cases

Upvotes: 12

Related Questions