Fail over on an Aurora RDS read-only replica with minimal downtime

I'm wondering what's the best practice to handle the failover of an Aurora RDS Instance. I have a writer instance with two reader instances connected. The three instances are in three different AZs.

Should I just select my desired instance and click "Failover"? What's the expected downtime? Can I do this while receiving production traffic?

Upvotes: 0

Views: 1340

Answers (3)

Eligio Mariño
Eligio Mariño

Reputation: 372

What's the expected downtime?

This article Comparing Failover Times for Amazon Aurora, Amazon RDS, and ClusterControl, written in 2019, suggests the failover duration can be around 7 seconds. You can see the steps they took and the different alternatives there.

I needed this information today to schedule downtime for a production application using RDS Aurora MySQL. As of September 2024, I couldn't find estimated numbers in the AWS documentation or blogs on how long the RDS Aurora instance would be down during failover. Fortunately, the total duration was less than 30 seconds when the failover happened. I didn't measure it programmatically, though. I just inspected the application logs.

Should I just select my desired instance and click "Failover"? Can I do this while receiving production traffic?

I clicked "Failover" in some RDS Aurora instances while receiving production traffic, and there was a downtime of less than 30 seconds. For more details about how to perform the failover, you can follow the AWS guide Failing over an Amazon Aurora DB cluster.

Upvotes: 0

BrianC
BrianC

Reputation: 1822

Part of the Aurora service is it already had redundancy. Actually redundant redundancy. If one fails it will take 30 sec. for it to switch over to another data center.

https://aws.amazon.com/rds/aurora/

Amazon Aurora's storage is fault-tolerant and self-healing. Six copies of your data are replicated across three Availability Zones and continuously backed up to Amazon S3

Upvotes: 0

Henry
Henry

Reputation: 1686

Failover is in principle instant - the very first thing AWS does is updates the DNS record so it points to the failover instant.

One thing to be aware of, though, is that a read replica is written to asynchronously, not synchronously, which means it will lag your main database by some level.

If you really want to do this whilst recieving production traffic, you in principle can, but you need to make sure any applications will reconnect to the database.

Upvotes: 1

Related Questions