What is the difference between appAutoScaling property vs autoscaling property in terraform?

Question

I am trying to use terraform to scale RDS cluster for Aurora.

I am setting up an RDS instance with 3 servers - 1 writer and 2 read-replicas. Here is my requirement

when any of the servers fail, add a new server such that the replica always has a minimum of 3 servers.
when CPU usage of any host exceeds 50% then add a new server to the cluster. The Max number of servers is 4.

Is it possible to create a policy such that when any of the 3 servers fail, then create a new server for that RDS instance? If yes, how to monitor server failure?
Do I need to use appAutoScaling or use autoScaling or both? This is the link that matches my use-case : https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_policy

Marcin · Accepted Answer

I developed an example of an terraform config file for your question. It is ready to used but should be treated as an example only for learning and testing purposes. It was tested in us-east-1 region using a default VPC with terraform 0.13 and AWS provider 3.6.

The key resources created by the example terraform config file are:

Public MySQL aurora cluster 1 writer and 2 replicas.
Application auto-scaling policy for Aurora replicas based on CPU utilization (50%) with min and max capacity of 2 and 4 respectively.
SNS topic and SQS queue subscribed to the topic. With the queue its easy to view SNS messages without the need to configure emails or lambda.
Two RDS Events subscriptions. One (e.g. failure) for cluster-level events and the second one for instance-level events. In both cases, the events are published to the SNS topic and then available in SQS for viewing.

Below I expand on the questions asked and the example config file.

Aurora MySQL cluster with 1 writer and 2 replicas

The cluster will be provisioned with 1 writer and 2 replicas.

Autoscaling policy for the replicas

An application-auto-scaling which is based on TargetTrackingScaling for RDSReaderAverageCPUUtilization. The scaling policy is based on the replicas overall CPU utilization (50%), not its individual replicas.

This is a good practice, as aurora replicas are load balanced automatically at connection level. This means that the new connections will be roughly spread equally across available replicas, on condition that you are using reader enpoint.

Also any alarm or scaling policy which you may apply to individual replicas will be void once the replicas get replaced by scaling in/out activities or failures. This is because any scaling policy would be bound to a specific db instance. Once the instance is gone, the alarm will not work.

The alarms associated with the policy that the AWS creates on your behalf can be viewed in CLoudWatch Alarms Console.

Aurora db instance failures

If any db instance fails, Aurora will automatically proceed with fixing the problem, which can include restarting db instance, promoting a read replica as a new master, restring MySQL, or fully replacing a failed instance.

You can simulate these events yourself to some extend as described in Testing Amazon Aurora Using Fault Injection Queries .

Test failover to read replica

aws rds failover-db-cluster --db-cluster-identifier aurora-cluster-demo

Test crash of master instance

This will result in automated restart of the instance

mysql -h  -u root -e "ALTER SYSTEM CRASH INSTANCE;"

Test crash of reader instance

This will result in restarting MySQL.

mysql -h  -u root -e "ALTER SYSTEM SIMULATE 100 PERCENT READ REPLICA FAILURE TO ALL FOR INTERVAL 10 MINUTE;"

Test replacement of the reader

You can simulate total failure of the reader instance by manually deleting it in the console. Once deleted, Aurora will provision a replacement automatically.

Monitor cluster failure

You can use Amazon RDS Event Notification to automatically detect and respond to variety of events associated with your Aurora cluster and its instances. Failures are one of the events captured by the RDS Event Notification mechanism.

You can subscribe to a category of events of interest and receive notifications to SNS. Once the events are detected and published into SNS you can do what you want with it. Examples are, invoke a lambda event to analyze the event and the current state of your Aurora cluster, execute corrective actions or send email notifications.

For example, when you manually force failover as earlier, you will get an message with the following info (only fragment shown):

\"Event Message\":\"Started cross AZ failover to DB instance: aurora-cluster-demo-1\"

and later:

\"Event Message\":\"Completed failover to DB instance: aurora-cluster-demo-1\"}"

The example terraform config files subscribes to a number of categories. Thus you would have to fine-tune them to exactly what you require. You could also subscribe to all of them, and have a lambda function analyze them when as they happen and decide if they should be archived only, or the function should execute some automated procedures.

AppAutoScaling or AutoScaling

Aurora read replicates are scaled using application-auto-scaling, not AutoScaling (I assume here that you mean EC2 AutoScaling). EC2 AutoScaling is used only for regular EC2 instances, not for RDS.

Example terraform config file

provider "aws" {
  # YOUR DATA
  region  = "us-east-1"
}

data "aws_vpc" "default" {
  default = true
}

resource "aws_rds_cluster" "default" {
  cluster_identifier      = "aurora-cluster-demo"
  engine                  = "aurora-mysql"
  engine_version          = "5.7.mysql_aurora.2.03.2"
  database_name           = "myauroradb"
  master_username         = "root"
  master_password         = "bar4343sfdf233"
  vpc_security_group_ids  = [aws_security_group.allow_mysql.id]
  backup_retention_period = 1
  skip_final_snapshot     = true
}

resource "aws_rds_cluster_instance" "cluster_instances" {
  count               = 3
  identifier          = "aurora-cluster-demo-${count.index}"
  cluster_identifier  = aws_rds_cluster.default.id
  instance_class      = "db.t2.small"
  publicly_accessible = true
  engine              = aws_rds_cluster.default.engine
  engine_version      = aws_rds_cluster.default.engine_version
}

resource "aws_security_group" "allow_mysql" {
  name        = "allow_mysql"
  description = "Allow Mysql inbound Internet traffic"
  vpc_id      = data.aws_vpc.default.id

  ingress {
    description = "Mysql poert"
    from_port   = 3306
    to_port     = 3306
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

}

resource "aws_appautoscaling_target" "replicas" {
  service_namespace  = "rds"
  scalable_dimension = "rds:cluster:ReadReplicaCount"
  resource_id        = "cluster:${aws_rds_cluster.default.id}"
  min_capacity       = 2
  max_capacity       = 4
}

resource "aws_appautoscaling_policy" "replicas" {
  name               = "cpu-auto-scaling"  
  service_namespace  = aws_appautoscaling_target.replicas.service_namespace
  scalable_dimension = aws_appautoscaling_target.replicas.scalable_dimension
  resource_id        = aws_appautoscaling_target.replicas.resource_id
  policy_type        = "TargetTrackingScaling"

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "RDSReaderAverageCPUUtilization"
    }

    target_value       = 50
    scale_in_cooldown  = 300
    scale_out_cooldown = 300
  }
}

resource "aws_sns_topic" "default" {
  name = "rds-events"
}

resource "aws_sqs_queue" "default" {
  name = "aurora-notifications"
}

resource "aws_sns_topic_subscription" "user_updates_sqs_target" {
  topic_arn = aws_sns_topic.default.arn
  protocol  = "sqs"
  endpoint  = aws_sqs_queue.default.arn
}

resource "aws_sqs_queue_policy" "test" {
  queue_url = aws_sqs_queue.default.id
  policy = <