Consumer or Enterprise Drives for RAID? (Part 2 of 2)

Monday, July 4, 2011 at 9:03 AM

In my last post, I described a couple of "ideal" scenarios that involved a standalone consumer-class hard drive, along with enterprise-class drives connected to a RAID controller. For the big finale, let’s look at the non-ideal scenario:

Scenario #3: Let’s say your data is stored on a RAID array using consumer-class drives.

You go to print your paper and one of the hard drives is unable to read a sector. What happens now? As mentioned previously, consumer-class drives don't give up quickly. On the flip-side, RAID controllers don't have much patience. After a handful of seconds, the controller says “that drive is not responding to commands so it must have failed, I’m going to kick it out of the array and get on with my day”. The controller detaches the drive from the array and recreates the missing data from the remaining drives and you’re able to print your paper.

So far that’s not such a bad thing, at least as far as your paper is concerned. You were able to print it out and go on your way. Due to the nature of RAID, all you should have to do is put a new hard drive back into the array and it will rebuild your parity data from the other drives. Right?

Unfortunately, this leaves you in a somewhat precarious position. The data on your array is now at risk (assuming RAID5). You don’t have any redundancy until the array can be completely rebuilt. What are the chances that you’d have 2 drives fail at the same time? Pretty low. What about the chances of there being a single read-error on one of the two remaining consumer-class drives during the rebuild process? Much greater. And guess what happens when one of those other drives encounters a read error, takes heroic measures to get it, and the controller kicks it out of the array? Very, very bad news for your data. Kiss it goodbye and you better have backups.

Scenario #4: Let’s compare this same situation using enterprise-class drives. You go to print, there’s a read-error on one of the drives, the drive gives up after 7 seconds and notifies the controller, the controller recreates the data from that one sector by using data from the other drives and ALL of your drives stay in the array! The controller can re-create the missing data from the other drives, write it somewhere else on this other drive, and you’re as good as new!

The moral of the story is: TLER/CCTL/ERC ensures that your hard drives stay in the array even when they encounter an error. Consumer-class drives are much more likely to be kicked out of an array under similar circumstances – and that’s bad news for your data.

This happened to me with some slight variations. I was using RAID6, which preserves data even with 2 drive failures. When one drive failed, I replaced it with a different one. During the rebuilt, another drive was kicked out of the array, and during a subsequent rebuild a 3rd drive was kicked out as well. This toasted the data on my array. It took weeks, $$$, and a lot of time to gather that data back together – probably a lot more than the cost delta between consumer-class and enterprise-class drives, and definitely more than a decent backup solution.

I've since moved to a ZFS-based storage appliance (NexentaStor) and religiously backup all of my data.

Line Rate | Powered by Blogger | Entries (RSS) | Comments (RSS) | Designed by MB Web Design | XML Coded By Cahayabiru.com