Kay
Kay

Reputation: 19690

How to set binary log retention in aws rds mysql

I want to set the binary log retention in aws rds for mysql. It looks like there are 2 places i can do this,

  1. via a procedural call

CALL mysql.rds_set_configuration(name,value);

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/mysql_rds_set_configuration.html

enter image description here

  1. Or via the param option group

enter image description here

If the values are different which takes precedent?

side question: the value in the param group is the default one aws rds sets (30) days. Is there some way i can know/measure how long i would need the binary log for?

Upvotes: 2

Views: 3439

Answers (1)

Bill Karwin
Bill Karwin

Reputation: 562701

I don't know which one takes precedence, but you should be able to check that yourself easily by using mysql.rds_show_configuration(). Change the parameter group setting, and then check if that affects the value when you show configuration using the procedure.

I have an educated guess that the parameter group is like changing the persistent setting in the my.cnf file, and changing the setting using the procedure is like using SET GLOBAL, which only lasts until you restart the instance. Then it reads the persistent setting from the parameter group and forgets that you changed it with the procedure.

Is there some way i can know/measure how long i would need the binary log for?

You can't know for certain how much binlog retention you need.

That statement needs some explanation.

Binary logs are used for:

  • Point in time recovery. For this to work, you need a full backup, and enough binlogs to replay events since the backup. So you need binlog retention only to the most recent backup. The frequency of backups is up to you. If too much time passes and there isn't a continuous set of binlogs to do PITR, then no problem — just create a new full backup.

  • Replication. Normally, a replica downloads binlogs nearly immediately, so the binlog retention can be very low. But if a replica is offline for some time, it's good that the binlogs stay on the source instance. So the binlog retention has to be long enough so there aren't any binlogs missed when the replica comes back online. If the replica misses any binlogs because they expired, then the replica cannot catch up. It must be wiped and reinitialized from a new backup.

  • CDC using tools like Debezium. Similar pattern as replication. If the CDC runs periodically, there needs to be enough binlog retention to cover the period between CDC. This could be seconds, or days. It's up to you to determine how often this runs.

So how can the source instance know if a replica or a CDC client has disconnected, when it might come back to download more? It can't know. Perhaps those clients were decommissioned and will never reconnect. The source instance storing the binlogs has no way of knowing.

So it's up to you to know things like how long your replica will be offline.

Or stated another way, if you have X days of binlog retention, it's up to you to make sure the replica is back up before the binlogs it needs start expiring.

If you can't do that, then the replica needs to be reinitialized. At my last DBA job, we had so many replicas that were offline for days due to server failures, we had to reinitialize replicas at least once a week.

Upvotes: 1

Related Questions