Sunday, February 17, 2008

Why do I have so many hard drives?

There are five hard drives in my main computer. There is no RAID setup. Why?

Hard drives fail. I've had the drive holding my root partition fail more than once. When that happens, I used to restore from backup. I would make a backup tape at least once a week, but a badly timed disk failure could still result in the loss of a lot of work.

My solution to this has been to buy my hard drives in matched pairs. I partition them equally, format them the same way, and install them both in the computer. One of them is the live disk, the other is the spare. The spare is kept unmounted and spun down. Every night around 3:00 AM, a cron job spins up the spares drives. Then, one partition at a time is fsck-ed, mounted, and copied to. The shell script uses rdist to synchronize the contents of the two partitions. Finally, I take special care to make the backup drive bootable. I use the LILO boot loader, so, when the root partition is mounted under /mnt/backup, the script executes the command:
/sbin/lilo -r /mnt/backup -b /dev/sdc

which, on my system, writes the LILO boot magic to the backup boot drive, which appears as /dev/sdc when it is the spare in my system. My lilo.conf file, on both the live system and the spare, refer to the boot drive as being /dev/sda, but this '-b' switch overrides that, so that the information is written to the boot block of the current /dev/sdc, but is written so that is appropriate for booting the device at /dev/sda (which it will appear to be should my live boot drive fail and be removed from the system).

Next, I use volume labels to mount my partitions. You can't have duplicate labels in the system, so my spare drive has labels with the suffix "_bak" applied. That means that the /etc/fstab file suitable for the live drive would not work if the spare were booted with that fstab. To solve this problem, the copying script runs this command after it finishes copying the files in /etc:
sed -e 's|LABEL=\([^ \t]*\)\([ \t]\)|LABEL=\1_bak\2|' /etc/fstab > /mnt/backup/etc/fstab

which has the effect of renaming the labels in the fstab to their versions with the _bak suffix, so they match the volume partitions on the spare hard drive.

OK, that sounds like a lot of work, why do I do it? What does it buy me?

First of all, it gives me automatic backups. Every night, every file is backed up. When I go to the computer at the beginning of the day, the spare drive holds a copy of the filesystem as it appeared when I went to sleep the night before. Now, if I do something really unwise, deleting a pile of important files, or similarly mess up the filesystem, I have a backup that I haven't deleted. If I were to use RAID, deleting a file would delete it immediately from my backup, which isn't what I want. As long as I realize there's a problem before the end of the evening, I can always recover the machine to the way it looked before I started changing things in the morning. If I don't have enough time to verify that the things I've done are OK, I turn off the backup for a night by editing the script.

Another important thing it allows me to do is to test really risky operations. For instance, replacing glibc on a live box can be tricky. In recent years, the process has been improved to the point that it's not really scary to type "make install" on a live system, but ten years ago that would almost certainly have confused the dynamic linker enough that you would be forced to go to rescue floppies. Now, though, I can test it safely. I prepare for the risky operation, and then before doing it, I run the backup script. When that completes, I mount the complete spare filesystem under a mountpoint, /mnt/chroot. I chroot into that directory, and I am now running in the spare. I can try the unsafe operation, installing a new glibc, or a new bash, or something else critical to the operation of the Linux box. If things go badly wrong, I type "exit", and I'm back in the boot drive, with a mounted image of the damage in /mnt/chroot. I can investigate that filesystem, figure out what went wrong and how to fix it, and avoid the problem when the time comes to do the operation "for real". Then, I unmount the partitions under /mnt/chroot and re-run my backup script, and everything on the spare drive is restored. Think of it as a sort of semi-virtual machine for investigating dangerous filesystem operations.

The other thing this gives me is a live filesystem on a spare drive. When my hard drive fails (not "if", "when", your hard drive will fail one day), it's a simple matter of removing the bad hardware from the box, re-jumpering the spare if necessary, and then rebooting the box. I have had my computer up and running again in less than ten minutes, having lost, at most, the things I did earlier in the same day. While you get this benefit with RAID, the other advantages listed above are not easily available with RAID.

Of course, this is fine, but it's not enough for proper safety. The entire computer could catch fire, destroying all of my hard drives at once. I still make periodic backups to writable DVDs. I use afio for my backups, asking it to break the archive into chunks a bit larger than 4 GB, then burn them onto DVDs formatted with the ext2 filesystem (you don't have to use a UDF filesystem on a DVD, ext2 works just fine, and it's certain to be available when you're using any rescue and recovery disk). Once I've written the DVDs, I put them in an envelope, mark it with the date, and give it to relatives to hang onto, as off-site backups.

So, one pair of drives is for my /home partition, one pair for the other partitions on my system. Why do I have 5 drives? Well, the fifth one isn't backed up. It holds large data sets related to my work. These are files I can get back by carrying them home from the office on my laptop, so I don't have a backup for this drive. Occasionally I put things on that drive that I don't want to risk losing, and in that case I have a script that copies the appropriate directories to one of my backed-up partitions, but everything else on that drive is expendable.

There are two problems that can appear with large files.
  • rdist doesn't handle files larger than 2 GB. I looked through the source code to see if I could fix that shortcoming, and got a bit worried about the code. So I'm working on writing my own replacement for rdist with the features I want. In the mean time, I rarely have files that large, and when I do, they don't change often, so I've been copying the files to the backup manually.
  • Sometimes root's shells, even those spawned by cron, have ulimit settings. If you're not careful, you'll find that cron jobs cannot create a file in excess of some maximum size, often 1 GB. This is an inconvenient restriction, and one that I have removed on my system.

No comments: