Backup Gods

A couple of days ago I woke up and looked across the room to see a red light flashing above one of the disk bays of my NAS server. Drowsily I reach for my laptop, open an SSH session with the server and reboot it. It’s back up surprisingly quickly and the red light has turned green. Panic over; false alarm, go back to sleep. And that’s precisely when I wake up again to see the red light still taunting me from across the room.
I dream about a couple more best-case fixes for the faulty disk before actually getting up to investigate the problem. Still hoping for a quick fix so I can get on with my day, I pull the drive from its bay in the server and re-insert it; the drive spins up but does nothing after that. I check the available volumes on the server – three 3TB hard disks and one 3.86GB volume in the faulty disk’s bay. I head to Amazon and buy an exact replacement for the failed drive, but that’s not satisfied my curiosity. I grab a USB to TTL adapter and solder some crimp connectors to a pin header so that I can connect to the small serial port on the disk’s controller board. A quick Google search returns the drive’s serial port configuration and in a couple of minutes I’m greeted by an error message:
fail servo op=0100 resp=0003
A mechanical failure. How very dull.

Having used a number of different machines and a considerable amount of storage space for most of my life, I’ve encountered my fair share of disk failures. Normally there’s a SMART warning just before the disk starts refusing to work as a disk, or one day it starts making an odd noise. But at this moment I remember a failure which I could never get my head around:
At age 15 I booted my main desktop PC one day to find that the primary hard disk no longer showed up. I’d not moved the machine or done anything to provoke a mechanical failure, the disk was less than a year old and there were no other obvious faults with the PC to indicate a bigger problem. I’m well aware that this sort of failure is far from impossible, but I wasn’t satisfied. The disk would spin up (and sounded fine) but everything I tried connecting it to refused to acknowledge it as a SATA device. I tried swapping the controller with that of another hard disk of the same model, still no luck. I lost interest and put the disk in a box of old hardware.

Concluding that the NAS disk is a lost cause I locate the box which became the five year old disk’s resting place. I suspect that five years of being shoved around will have killed it off completely but I’m still a little curious. Searching the part number reveals that there was a known fault with the disk which the manufacturer – Seagate – had offered a data recovery service for a few years ago. A bit more searching points me in the direction of a fix I can carry out over the serial port, so I hook it up to the USB->TTL and connect to it. To my surprise it spins up and starts talking back to my serial terminal.
A few minutes later the disk is working and showing up as a three partitions in Windoze. The music and MSN Messenger chat logs which seemed so important five years ago are of little interest to me now, but the situation amuses me somewhat: the failure of this disk was what pushed me into using RAIDs to protect against hardware failures – the very reason my most recent failure cost me no more than the replacement cost of the hard disk. I feel the backup gods are recognising my progress. Or perhaps daring me to continue running my NAS volume in degraded mode.

Leave a Reply

Your email address will not be published. Required fields are marked *