Reputation: 6953
I am trying to create an RDS Aurora MySQL cluster in AWS using Terraform. However, I notice that any time I alter the cluster in a way the requires it to be replaced, all data is lost. I have configured to take a final snapshot and would like to restore from that snapshot, or restore the original data through an alternative measure.
Example: Change Cluster -> TF Destroys the original cluster -> TF Replaces with new cluster -> Restore Data from original
I have attempted to use the same snapshot identifier for both aws_rds_cluster.snapshot_identifier
and aws_rds_cluster.final_snapshot_identifier
, but Terraform bombs because the final snapshot of the destroyed cluster doesn't yet exist.
I've also attempted to use the rds-finalsnapshot
module, but it turns out it is primarily used for spinning environments up and down, preserving the data. i.e. Destroying an entire cluster, then recreating it as part of a separate deployment. (Module: https://registry.terraform.io/modules/connect-group/rds-finalsnapshot/aws/latest)
module "snapshot_maintenance" {
source="connect-group/rds-finalsnapshot/aws//modules/rds_snapshot_maintenance"
identifier = local.cluster_identifier
is_cluster = true
database_endpoint = element(aws_rds_cluster_instance.cluster_instance.*.endpoint, 0)
number_of_snapshots_to_retain = 3
}
resource "aws_rds_cluster" "provisioned_cluster" {
cluster_identifier = module.snapshot_maintenance.identifier
engine = "aurora-mysql"
engine_version = "5.7.mysql_aurora.2.10.0"
port = 1234
database_name = "example"
master_username = "example"
master_password = "example"
iam_database_authentication_enabled = true
storage_encrypted = true
backup_retention_period = 2
db_subnet_group_name = "example"
skip_final_snapshot = false
final_snapshot_identifier = module.snapshot_maintenance.final_snapshot_identifier
snapshot_identifier = module.snapshot_maintenance.snapshot_to_restore
vpc_security_group_ids = ["example"]
}
What I find is if a change requires destroy and recreation, I don't have a great way to restore the data as part of the same deployment.
I'll add that I don't think this is an issue with my code. It's more of a lifecycle limitation of TF. I believe I can't be the only person who wants to preserve the data in their cluster in the event TF determines the cluster must be recreated.
If I wanted to prevent loss of data due to a change to the cluster that results in a destroy, do I need to destroy the cluster outside of terraform or through the cli, sync up Terraform's state and then apply?
Upvotes: 2
Views: 3338
Reputation: 6953
The solution ended up being rather simple, albeit obscure. I tried over 50 different approaches using combinations of existing resource properties, provisioners, null resources (with triggers) and external data blocks with AWS CLI commands and Powershell scripts.
The challenge here was that I needed to ensure the provisioning happened in this order to ensure no data loss:
snapshot_identifier
to specify the snapshot taken in the previous step.Of course these steps were based on how Terraform decided it needed to apply updates. It may determine it only needed to perform an in-place update; this wasn't my concern. I needed to handle scenarios where the resources were destroyed.
The final solution was to eliminate the use of external data blocks and go exclusively with local provisioners, because external data blocks would execute even when only running terraform plan
. I used the local provisioners to tap into lifecycle events like "create" and "destroy" to ensure my Powershell scripts would only execute during terraform apply
.
On my cluster, I set both final_snapshot_identifier
and snapshot_identifier
to the same value.
final_snapshot_identifier = local.snapshot_identifier
snapshot_identifier = data.external.check_for_first_run.result.isFirstRun == "true" ? null : local.snapshot_identifier
snapshot_identifier
is only set after the first deployment, external data blocks allow me to check if a resource exists already in order to achieve the condition. The condition is necessary because on a first deployment, the snapshot won't exist and Terraform will fail during the "planning" step due to this.
Then I execute a Powershell script in a local provisioner on the "destroy" to stop any DMS tasks and then delete the snapshot by the name of local.snapshot_identifier
.
provisioner "local-exec" {
when = destroy
# First, stop the inflow of data to the cluster by stopping the dms tasks.
# Next, we've tricked TF into thinking the snapshot we want to use is there by using the same name for old and new snapshots, but before we destroy the cluster, we need to delete the original.
# Then TF will create the final snapshot immediately following the execution of the below script and it will be used to restore the cluster since we've set it as snapshot_identifier.
command = "/powershell_scripts/stop_dms_tasks.ps1; aws rds delete-db-cluster-snapshot --db-cluster-snapshot-identifier benefitsystem-cluster"
interpreter = ["PowerShell"]
}
This clears out the last snapshot and allows Terraform to create a new final snapshot by the same name as the original, just in time to be used to restore from.
Now, I can run Terraform the first time and get a brand-new cluster. All subsequent deployments will use the final snapshot to restore from and data is preserved.
Upvotes: 4