Recovering from MySQL replication errors quickly via snapshots
MySQL replication is great when it works, but when it fails, it can potentially require a lot of time and effort to troubleshoot. We have a master-master replication setup for a database which is relatively small (~10MB) and - knock on wood - there have been no problems with replication yet. But for another, much larger database (a few hundred MB in size), with much greater real-time usage traffic which is just master-slave and in fact, simpler, we have already seen replication fail quite a few times already. These usually have to do with unanticipated shutdowns of either the master or the slave which somehow corrupt the logs. The procedure outlined below suggests a quick way to recover from such situations. It is mainly intended to work with a master-slave setup.
Figuring out the replication error directly is a black art that will consume hours and hours and probably end in utter frustration so once a replication error occurs, the best - and safest - bet is to just nuke the (slave) database in question, restore it from a snapshot backup of the master and restart the slave at the proper position. Even for databases of a few hundred megabytes in size, making a snapshot takes just seconds or a couple of minutes at most, so this should not impact the operation of even a website with a lot of traffic. NOTE: If you have a small database and network traffic is not an issue, you can in fact just start replication from a blank slave database, but for a big working db, it is much faster to take a snapshot and recover from it.
Here is how to do this:
1. Lock the master database, make a snapshot, and know the log position:
mysql> use dbname mysql> flush tables with read lock mysql> show master status; Query OK, 0 rows affected (0.02 sec) +------------------+-----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +------------------+-----------+--------------+------------------+ | mysql-bin.000012 | 380594610 | | | +------------------+-----------+--------------+------------------+ 1 row in set (0.00 sec)
2. Dump the database and record the binlog filename and position in the dump file's name:
mysqldump -u user --password=XXXX --opt dbname | gzip > dbname-yyyymmdd-mysql-bin.000012-380594610.sql.gz
3. Unlock the database, so that operations on it can continue!
mysql> unlock tables; Query OK, 0 rows affected (0.00 sec)
4. Now you have a snapshot from which you can restart replication at any time. If for example, replication fails again in the future, you can just nuke the slave database, restore from this snapshot db and restart the slave sync at the recorded position. Here's how to do that. First, stop the slave thread on the slave server:
mysql> SLAVE STOP;
Then, DROP the old database on the slave server and dump the snapshot over to it.
mysql> DROP dbname;
and on the command line,
mysql -u root -p dbname < dbname-yyyymmdd-mysql-bin.000012-380594610.sql # remember to gunzip first!
5. Now, point the slave server to the correct master parameters, and restart the slave thread:
mysql> change master to master_host='your.masterserver.com', master_user='username', master_password='password', master_log_file='mysql-bin.000012', master_log_pos=380594610; mysql> start slave;
At this point, assuming you had your replication parameters correctly set earlier, your master-slave setup should again be quitely humming along.
IMPORTANT: By default, replication works on a per server basis, so if you
have multiple databases on the master, you have to take a snapshot of
all the databases on it and drop/restore each one of these to the slave, otherwise
the replication will not work. To configure replication to work only
with certain databases or tables, see
In our experience, for databases, both
© 2019 by Andy Sy