Web Hosting Philippines, Offshore Programming, Offshore SEO Philippines, Cheap Webhosting Manila Philippines
Home -> Resources -> MySQL -> Quick Replication Error Recovery via snapshots

Recovering from MySQL replication errors quickly via snapshots

MySQL replication is great when it works, but when it fails, it can potentially require a lot of time and effort to troubleshoot. We have a master-master replication setup for a database which is relatively small (~10MB) and - knock on wood - there have been no problems with replication yet. But for another, much larger database (a few hundred MB in size), with much greater real-time usage traffic which is just master-slave and in fact, simpler, we have already seen replication fail quite a few times already. These usually have to do with unanticipated shutdowns of either the master or the slave which somehow corrupt the logs. The procedure outlined below suggests a quick way to recover from such situations. It is mainly intended to work with a master-slave setup.

Figuring out the replication error directly is a black art that will consume hours and hours and probably end in utter frustration so once a replication error occurs, the best - and safest - bet is to just nuke the (slave) database in question, restore it from a snapshot backup of the master and restart the slave at the proper position. Even for databases of a few hundred megabytes in size, making a snapshot takes just seconds or a couple of minutes at most, so this should not impact the operation of even a website with a lot of traffic. NOTE: If you have a small database and network traffic is not an issue, you can in fact just start replication from a blank slave database, but for a big working db, it is much faster to take a snapshot and recover from it.

Here is how to do this:

1. Lock the master database, make a snapshot, and know the log position:

mysql> use dbname
mysql> flush tables with read lock
mysql> show master status;
Query OK, 0 rows affected (0.02 sec)

| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
| mysql-bin.000012 | 380594610 |              |                  |
1 row in set (0.00 sec)

2. Dump the database and record the binlog filename and position in the dump file's name:

mysqldump -u user --password=XXXX --opt dbname | gzip > dbname-yyyymmdd-mysql-bin.000012-380594610.sql.gz

Remember to delete the bash history so your password is not exposed:

history | grep password
history -d nnn # where nnn are the lines where the password is showing, do as many
               # times as necessary, from higher numbered lines first

Alternatively, you can perform the above in two steps to avoid showing the password in the bash history

mysqldump -u user -p --opt dbname > dbname-yyyymmdd-mysql-bin.000012-380594610.sql
gzip dbname-yyyymmdd-mysql-bin.000012-380594610.sql

3. Unlock the database, so that operations on it can continue!

mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)

4. Now you have a snapshot from which you can restart replication at any time. If for example, replication fails again in the future, you can just nuke the slave database, restore from this snapshot db and restart the slave sync at the recorded position. Here's how to do that. First, stop the slave thread on the slave server:

mysql> SLAVE STOP;

Then, DROP the old database on the slave server and dump the snapshot over to it.

mysql> DROP dbname;

and on the command line,

mysql -u root -p dbname < dbname-yyyymmdd-mysql-bin.000012-380594610.sql # remember to gunzip first!

5. Now, point the slave server to the correct master parameters, and restart the slave thread:

mysql> change master to master_host='your.masterserver.com', master_user='username', master_password='password',
                        master_log_file='mysql-bin.000012', master_log_pos=380594610;
mysql> start slave;

At this point, assuming you had your replication parameters correctly set earlier, your master-slave setup should again be quitely humming along.

IMPORTANT: By default, replication works on a per server basis, so if you have multiple databases on the master, you have to take a snapshot of all the databases on it and drop/restore each one of these to the slave, otherwise the replication will not work. To configure replication to work only with certain databases or tables, see http://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html. In our experience, for databases, both replicate-do-db as well as replicate-ignore-db have to be filled with the db names.

 Want to tip via bitcoin?

(address string below for double checking)


© 2017 by Andy Sy
last updated: 2010-Aug-2

Web Development / Rich Internet Applications (RIA) Development

Programming Languages


Database Development

SEO Consultants, Offshore SEO/SEM Philippines
   © 2003-2015 Neotitans Technologies Inc.