Hard Drive Failure

Posted at 5:17:47 AM in Hardware (8) | Read count: 3341

Raid Drive Failure:

The company had setup a junk raid server to handle bulk image storage for a document imaging service. The document imaging program was Docuware which used SQL as a database and allowed for storing the images on any attached storage device. However; we were told that certain options of the software we had purchased would not work unless the software was installed with the images on the same server with the SQL installation. The option that we were not able to use was the ability to store the images on CDs so that the CDs could be shipped to another location and viewed there.

Apparently, the RAID already had a faulty drive in a 3 disk configuration. The created a potentially bad situation. We had identified the incorrect drive as being the problem, but didn't think it was an issue as we were backing up the data to an external removable drive.

The backup software we used was NTBackup with a backup script that ran in the scheduler. It backed up only the changed or added files during the week nights and on Fridays it backed up the entire drive. The backup file size was approximately 69Gig with over 800 Tiff and Jpeg images an a huge directory structure.

A description of the file structure would be helpful as the recovery process depended a lot on knowing how this structure worked and what the file types were. Docuware creates the directories and the file names using a math algorithm which allows it to determine where the document can be found just by the number of the document. Each directory can hold 255 images and then a 3 digit folder increments up to 255. When the folder reaches 255 and new upper folder is created and then sub folders are created under that folder and the increments start over at 000. So a folder for royalties might be in the folder royalties.000/000/00000001.001. The extension on the document increments to show the number of pages in the document and the filename continues to increment in order to limit the number of files that are contained in one directory. Because windows file types are determined my extension, this system doesn't allow for the documents types to be indicated. These files are tiff files and in the header of each Tiff files in data about the document or any attachments that may be required. I mentioned Jpeg images before and these are stored in number format, but end in JPG. These files are linked to their tiff files by the data in the headers and the tiff files contain no image info.

The reason the file structure is important is that to restore the data, it is imperative that the directory structure remain in tact. A simple undelete program might recover tons of images, but most recovery software examines the contents of the file and creates a file name of it's own and no directory structure is provided, so simply having the images is useless.

A second drive in the RAID failed causing the entire RAID to become useless. The original sporatic drive which was failing was thought to be in the Operating system and the thought at that time was that we'd obtain an image of the OS and when it failed entirely, we could restore the data image to a single drive. But it turns out that the original failed drive was in the data array.

Since the OS continued to work, the backup scheduler continued to work. This exacerbated the issue. The drive failed on Friday. No one contacted IT about the issue until Monday which meant that the Saturday full tape backup was run. The NTbackup software was configured to overwrite the existing file with the new backup. This resulted in all the data being lost and the main backup being lost as well.

The RAID was disassembled and images were made of the drives. An attempt was then made to re-create the RAID with software, but many problems such as no knowing the striping algorithm and drive header information prevented an easy rebuild. Two of the drives were accessible when assembled in restructuring environment, but no directory structure could be recovered.

We then located a data recovery company who would attempt to recover the data. If they were able to recover the data, then we'd pay. Otherwise no money would be involved and the amount as negligible anyway, so we chose to send the drives off for repair and then turn our attention to the backup drive.

The company we used was 1stdatarecovery.com. This company offers to recover the data from raids for $800 USD. You really have to examine the web site to determine the actual cost, but because these were SCSI drives, there was a 150% markup. There is also an additional fee for getting the drives back and supposed additional fee to get any additional information such as a file listing of the recovered files (very important to take advantage of this). Still the price to recover the data was way below other organizations that wanted the money up front and charged 10 to 100 times that amount.

They had a location in California also and I thought this was handy, but it turns out the company is actually in Canada and all the other locations are UPS drop off spots. I really confused the shipping process by showing up at the UPS drop off store. The worst part was that I couldn't get a tracking number as all the drives that were being sent to Canada were aggregated into a larger package and shipped in bulk. I called later and got the tracking number of the bulk shipment.

1stdatarecovery.com was very prompt at getting back to me to get additional information, even though I was put off at having to answer the same question over and over, still there was almost daily communication which I was very impressed with. The only problem was when they said they had the data recovered, I asked for a partial directory listing which they were happy to send me for free. That was a major folly. We paid the 1300 and had the data sent to me and then found out that the partial listing they sent was the only 2gig of data that they got a structure listing of. In addition, all of the files were cross-hashed with bogus data as they had the stripping completely wrong. The remaining 100gig of data was all in one directory that had all made up names which was exactly the same as I would have gotten with my recovery program.

1stdatarecovery.com offered to re-extract the data, but I was having pretty good luck with the backup drive so I ignored that because I didn't want to pay to get the data a second time only to find the data still not correct. In addition, they didn't have a external drive to save the data to so I'd either have to buy a drive or send them a drive to save the data to.

During the re-building phase, I had purchased Active UNDELETE 7 Enterprise which claimed the ability to rebuild and recover from RAIDs. The recovery process was very flakey and I had frequent problems with the program crashing. I determined that perhaps I should build a bootable CD as that is offered in the software, however; the CD is seriously gimped and did not offer any ability to rebuild the RAID, so I had to install the software on the server. After much work, I realized I didn't have enough information to rebuild that drive and there was no assistance in the program to help determine striping or parity. I eventually abandoned the software for rebuilding the RAID.

After I sent off the drives to 1stdatarecovery.com, I thought I might be able to use Active UNDELETE 7 Enterprise to extract the lost BKF files on the backup drive. However, that also became a problem as Active UNDELETE 7 Enterprise does not have an image for BKF and though it has an option for upgrading it, apparently no one is adding other image detection thumbprints, so the money I paid for it was wasted.

I later found Handy Recovery from Softlogica. This had a thumb print for BKF files and found several clusters on the drive that were intact enough to give me files 20 and 30 gig in size. I used the evaluation to extract one or two larges files (you can only extract one a day during the evaluation, but the large files really made this worth it). After I had extracted the files, I couldn't find any software that would rebuild files from the corrupted BKF files until I ran across NTBKUP.exe on www.fpns.net/willy/msbackup.htm. This page gave me the info I needed to recover the data. The designer of this package blows past a lot of the overhead of NTbackup and allows the contents to be read even if the headers are missing. It would not build the directory structure without the drive letter being in the file and I didn't have the drive letter in all of the clusters, but this worked for me.

I was able to extract the directory structures and the locations of each file by running NTBKUP in the verbose mode and redirect the output to a text file which I later manipulated to create the directories, then change into the directory and run the extract for the files that were inside that group.

There are some anomalies that I can't explain in the output of the NTBKUP file, but the recovery of data was over 95% of the data and the file structures.(see details)

Lessons learned:

1. Don't rely on only one backup device. Currently, I am rotating two external backup devices. And checking them for consistency and error.

2. Pay attention to failed drives. Of course, my resources are limited to what the owners will pay for and it always bothers me when my recommendations are ignored and then I later have to present to them an issue that could have been avoided.

3. Obtain the entire evidence of recovery rather than a portion. Of course, even if I had a complete listing of the directories, I couldn't be sure that the files were complete. 1stdatarecovery.com couldn't inspect the files either, since with the extension being numbers, they couldn't tell the files were tiff files, though I had explained it. I think I might as well been speaking a foreign language as what I was telling them was unfamiliar to them.

Written by Leonard Rogers on Tuesday, December 21, 2010 | Comments (0)


    Name
    URL
    Email
    Email address is not published
    Remember Me
    Comments

    CAPTCHA Reload
    Write the characters in the image above