Reputation: 13
We have a "1 master, 1 slave" MySQL setup. We had a sudden power outage that took down the slave. After getting the machine back up, I found that the slave was out of sync with the master:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.0.0.1
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-log.001576
Read_Master_Log_Pos: 412565824
Relay_Log_File: mysqld-relay-bin.002671
Relay_Log_Pos: 6930
Relay_Master_Log_File: mysql-log.001573
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table: blah.table2
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table blah.info; Can't find record in 'info', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log.001573, end_log_pos 689031225
Skip_Counter: 0
Exec_Master_Log_Pos: 689030864
Relay_Log_Space: 2944772417
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Update_rows event on table blah.info; Can't find record in 'info', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-log.001573, end_log_pos 689031225
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)
We're using a binlog format of "ROW", so when I try to use mysqlbinlog to look at the offending row, I don't see anything of use. I don't want to simply set the skip counter, because I think that would throw my table even further out of sync.
Is there anything I can do on the slave that would essentially "roll back" to a given point in time, where I could then reset the master log number, poition, etc? If not, is there anything at all that I can do to get back in sync?
Upvotes: 1
Views: 11670
Reputation: 562328
One can usually recover from small discrepancies using pt-table-checksum and pt-table-sync.
It looks to me like your replica lost its place in the binary log sequence when it crashes. The replica continually writes its last processed binlog event into datadir/relay-log.info, but this file uses buffered writes, so it is susceptible to losing data in a crash.
That's why Percona Server created a crash-resistant replication feature to store the same replica info in an InnoDB table, to recover from this scenario.
MySQL 5.6 has implemented a similar feature: you can set relay_log_info_repository=TABLE
so the replica saves its state in a crash-resistant way.
Re your comment:
Yes, in theory pt-table-sync can fix any amount of replication drift, but it's not necessarily the most efficient way to correct large discrepancies. At some point, it's quicker and more efficient to trash the outdated replica and reinitialize it using a new backup from the master.
Check out How to setup a replica for replication in 6 simple steps with Percona Xtrabackup.
Upvotes: 1