RDS - max replica lag when using read replica

Question

I am using RDS's read replica mechanism for a schema update to a very large Mysql table. I ran an Alter command which locks the table for a long period of time (more than 24 hours). In that period of time my read replica was not getting updated and I noticed the Replica lag value was slowly increasing.

When the table update was complete I saw that the Replica lag was slowly decreasing until the read replica finally caught up with the original DB.

While my Alter command was running, I did a small experiment and occasionally updated a specific row so I can follow it on my read replica. My experiment showed that the updates to this specific row indeed eventually happened also in the read replica (after the table was unlocked).

Based on the above experiment result I assume all updates that were blocked while my read replica was updating eventually were also performed on my replicated DB after the table modification but it would be hard to prove something like that for such a big table and such a long period of time.

I couldn't find any official documentation on how this mechanism works and I was wondering where exactly all these updates are buffered and what would be the limit of this buffer (e.g. when will I start loosing changes that occured on my master DB)?

Michael - sqlbot · Accepted Answer

This is covered in the documentation. Specifically, the replica ("slave") server's relay log is the place where the changes usually wait until they are actually executes on the replica.

http://dev.mysql.com/doc/refman/5.6/en/slave-logs.html

But, the limit to how far behind a replica can be -- but still, eventually, have data identical to the master -- is a combination of factors. It should not ever quietly "misplace" any of the buffered changes, as long as it's being monitored.

Each time the data on the master database changes, the master writes a replication event to its binary log, and these logs are delivered to the replica, usually in near-real-time, where they are stored, pretty much as-sent, in the relay logs, as the first step in a 2-step process.

The second step is for the replica to read through those logs, sequentially, and modify its local data set, according to what the master server sent. The statements are typically executed sequentially.

The two biggest factors that determine how far behind a replica can safely become are the amount of storage available for relay logs on the replica and the amount of storage plus log retention time on the master. RDS has additional logic on top of "stock" MySQL Server to prevent the master from purging its copy of the log until the replica(s) have received them.

RDS - max replica lag when using read replica

Answers (1)

Related Questions