Backing Up Two Ways from Sunday
One method of backup or recovery isn’t enough. Period. No matter what anyone tells you, what the book says, what your boss says, or what you think you need, you need to be backing things up in many ways.
Here’s a few examples.
MySQL
Theoretically, you could recover anything you needed from the binary log, as long as you’ve got a good starting point and a good ending point. (This, by the way, is a good reason to flush the binary logs and take a backup on a regular basis.) What if your binary log’s corrupted, though? You need to fall back to a full SQL backup … which you’re doing regularly, right?
If your binary log is corrupted, any mirrors you are using that are based on that binary log are corrupted as well.
Case in point: I had a client with a very active, very large database… north of 15GB in InnoDB. The binary log hit a bug and corrupted itself. The backups were being done from that mirror so that they didn’t interrupt the main machine’s processing, but they only kept a few days worth, so we couldn’t use those backups to restore. The most recent un-corrupted dump from the main machine had been taken three months before. Luckily, the client had done some application-level backups to an XML format, and we were able to (laboriously) restore from that. It cost about $3,000 because they didn’t want to degrade their forum’s performance for a half hour every night and pay for an extra TB or storage or so to keep more than a few days worth of upgrades.
Servers
Scenario: Hard drive gets corrupted or dies. You need to get the machine back up quickly. You have a snapshot of the machine … but your snapshot is on the same storage as that machine unless you back it up somewhere else.
On top of that, storage requirements have been growing rapidly for servers. Where a linux server take less than 1GB, Windows 2008R2 can take up 20GB with system files alone. (In fact, if you plan to have any data on that server, or keep any logs, we’d recommend going with 40GB minimum for your C: drive.) It’s important to back that up to something that’s not on the same system disks.
Better yet, take a hint from the application-level backups — and back up your registry, configuration files, and data separately from the snapshot. We tend to use RSync for this role and put it in a rolling-backup mode with the –link-dest option to ease recovery.
VMWare
Same principle as above. Snapshots are usually stored in the same datastore. Datastore goes bye-bye, so do your snapshots.
There’s some great products out there that can really help with this issue. The one we use is VEEAM Replication and Backup. It can be used to replicate a snapshot to another VMWare cluster, or back up the datastore files at a consistent snapshot point and then copy them elsewhere all in one step. We use a two-step process — we keep them locally on the backup server and also transmit them to another datacenter across campus.
When using VEEAM with Windows, make sure that VMWare Tools is installed and that you enable the VSS integration. (You’ll also need to make sure that the administrative share option on the system drives are enabled, and that the appropriate firewall ports are opened.) This ensures that you’ve got a transactionally consistent backup snapshot.
Practice, practice, practice
The only way to make sure that you can recover from a disaster is to test recovering from a disaster. At least once a year, we practice recovering from a worst-case scenario. That means bringing up a new machine from scratch, re-implementing all of the options and configurations, and then restoring the data. Despite that kind of restoration being something that should never happen, it does — and practice gives you insights into how to improve the processes and turns a recovery operation from an expensive nightmare that sets back all of your other processes into something that you can execute quickly and professionally.