This story is far enough in my past now, I can finally tell it. To give some background, I was working at a university doing desktop support. THAT ALONE probably fills a few of you with dread. But to be specific, I was working for the nice folks in Dewey Hall. I say “nice folks” because they had the decency to hire me and set aside some of their budget for my salary. The deal was, I worked for them, but if my schedule allowed, I was to help out at the neighboring buildings, Screwem Hall and Howe Hall. Those buildings contributed nothing to IT and basically got free support.
I get a call over to Screwem Hall, and a tearful lady tells me that her “backup drive” isn’t working. Because this isn’t technically my building, I have no idea what she’s talking about.
It turns out that, years ago, before my time, before my co-worker’s time, before my boss’s time, in fact, before anyone I knew actually worked there, someone had set this lady up with a four-disc RAID-1 array. A bad storm had come through during the night, and this morning… it no workeez. Bear in mind, at this point, you readers know more than I did while this lady is breaking into hysterics about her years of lost work.
“It was supposed to be automatically backed up! That’s the entire purpose of that thing! And it’s supposed to do it automatically, I’ve never had to touch it! YEARS OF WORK ON THERE! IF IT’S LOST I’M GOING TO COLLAPSE!”
Again, this is not my territory and in all my time here, no one ever alerted me to this drive’s existence. So, I sit down and try to assess the situation… and as you’d all agree, the ideal time to analyze a system is NOT after it’s already had a massive failure.
…Turns out, this is a four-bay enclosure, with four 1TB drives, configured into a 2TB drive with a 2TB redundancy drive. Obviously, if a disk failed, the data could be reconstructed from the redundancy drive.
…Except it wasn’t one disc that had failed, it was two. So her chances of success now hinge on the failed drives NOT having the same data. I decided to look into this possibility.
…Except that the array’s management software hadn’t been loaded onto this computer, since the array had been there so long the user had replaced their computer in the meantime, and no one but the original tech even knew the management software existed. So I decide to access the menu of the drive directly.
…Except it’s a painful cluster of menus, and I’m afraid to even turn the damn thing off and on, for fear of making any data corruption worse than it already is. The only thing I CAN tell is that it’s set to automatically rebuild, so if the data is all still on there, it’ll go into auto-pilot mode.
So I tell her the short version: that her main chance of success is to hope this thing works as designed. I tell her to go buy me two appropriate-sized drives and let me install them. She does so, I install them, and I get nothing. No data comes back.
RAID LADY LOSES HER FUCKING MIND.
There are words. There are noises. There are screams. There are tears. And that’s just for starters. She calls my boss, reports me, announces a vendetta against our whole department. Without even blinking an eye, I calmly list the following:
I had no idea this device existed in our environment, therefore, I had never been able to do any preventative maintenance on it.
Whoever set it up set it up in a highly questionable manner. First, they did not put it on a UPS, which made the electrical damage from the storm that much more likely. That was stupid.
Second, they used double the number of hard drives needed to achieve the space needed. Even years ago, 2TB hard drives weren’t that expensive, least of all if you’re charging them to an academic budget. By doubling the number of disks, he doubled the potential points of failure. Stupid.
Third, all the hard drives were the same model and from the same batch, indicating if one had a manufacturing flaw, the others would suffer the same flaw, multiplying the potential for failure. Stupid.
Fourth, all the hard drives were original to the installation. None had been replaced proactively. So all were out of warranty and well past their expected time to fail. Stupid.
Fifth, the software suite which MIGHT have alerted this lady to the fact that her drives were in bad shape OR that her backups weren’t being done was never installed. GOD DAMN FUCKING STUPID.
…But no, it’s apparently MY FAULT her data is gone. Because I couldn’t fix the magic drive that was supposed to never break.