Write-once Media Validation
A write-once backup is written once and never altered. This provides a special type of protection for your digital data, when it’s combined with some simple verification techniques. It’s possible to use write-once verification tools to be totally certain that your backups are in good condition. These techniques are well-suited to optical media like DVD or Blu-ray, but they can also be extended to hard drive storage.
Why is write-once media special?
Doesn't my disc-burning hardware do this for me?
Can I use this for hard drive storage?
checkSum+ for Mac
Corz Checksum for Windows
Using ImageVerifier for Mac or Windows
What to do if you spot any problems
One of the most vexing problems with data validation is that it can be impossible to tell the difference between an intentional change to a file, and one that may be caused by a computer error, a human error, or a virus.
If you remove the possibility of intentionally changing a file, then you can remove any uncertainty about whether the change is intended or not: any change to the file would be an indication of a problem. And if the files show no change, then you know everything is in exactly the same condition it was in when first stored.
A checksum-based verification of write-once media lets you know that everything is fine with the files, or that you are starting to have a problem with that copy of the files, and that you need to restore from backup right away.
Write-once media like CD, DVD or Blu-ray disc is perfect for this, since files on write-once media can't be altered once the disc has been written. You can also do this with hard drives, but only if you don’t ever update the files once you have created the checksum. Once you make any change at all to the files, you invalidate the checksum.
A disaster-recovery copy
It’s probably helpful for most people to consider their write-once copy of the files as a disaster-recovery copy. This is the copy you make, put away, periodically validate, and never expect to use, except in the case of some catastrophic loss.
Do a periodic validation
You'll want to periodically validate your write-once media to make sure that everything's fine. If you have a lot of optical discs, it's probably best to do a random sampling, making sure to choose discs from different eras, and to choose discs from different manufacturers if there is variation in your collection.
When you burn a disc, the software does some level of validation of the burning process. This might be a comparison of the file size, which provides a basic indication of the burn. The software might even do a bit-for-bit comparison of the original and the copy, which can tell you that the files have been copied perfectly.
These processes, however, only tell you about the data at the time of the disc burning. You could run it again at some point in the future, but it would only give you valid results as long as the original files are never changed. If any change is made to the original file after burning, then you will get a mismatch when comparing, which makes the process much less useful.
The process we will examine on this page provides a much higher level of validation.
How does it work?
At its core, the process is very simple. Selected files are run through an open-source md5 (or other) checksum utility and a checksum is created for each file. The software stores the checksum along with the file name and possibly a file path. When you want to run a validation on the files, the checksum is recomputed and compared to the stored version. If it matches, the software reports that everything is fine, as shown in Figure 1. If there is a mismatch, then that is reported to the user.
Figure 1 If the stored checksum matches a recomputed checksum, then you know the stored version of the file is exactly the same as when it was first written. If it does not match, then we’ve got some kind of problem with the storage media or process.
Some software can create a single checksum for an entire folder or even an entire hard drive. While this has value for some purposes (such as for software installer CDs), it’s not ideal for media file storage. The validation methods on this page create a checksum for each file and save it individually. There are two basic ways to store these: database-stored checksums and folder-stored checksums.
Some validation software stores the checksums in a single large database. This method offers two primary advantages.
The first is that it allows the software to create a single checksum for each version of a file and keep it in a centralized place. By backing up the database (usually by backing up the User folder), this important data can be preserved comprehensively. The second advantage is that these programs are often smarter about finding a file, even if it has been moved. This can be important for some workflows.
Database-stored checksum software has some real drawbacks, however. If you have multiple computers, it can be difficult to figure out how to make use of the database as a folder is moved computer-to-computer. Additionally, since the checksum does not travel with the file, it may become unavailable due to file transfer, computer crash, or some other problem. For this reason, we are now recommending that you consider using a folder-stored checksum
The video in Figure 2 below outlines the use of ImageVerifier, which operates in this manner.
Figure 2 This movie shows how to use ImageVerifier to checksum an optical disc.
Verification software can also keep the checksum right alongside the files in the same folder. This offers a number of advantages.
- Most important, the checksum travels with the file. So if you have the file, you have the checksum.
- This allows the checksum to be used to determine transfer verification. Once the files have been hashed.
- This software is often simply easier to use.
- The open text-based storage of the checksum can allow multiple programs to make use of the checksum. This could be important if the program you are using becomes unsupported at sometime long into the future.
There are also a few disadvantages to folder-based checksum storage. If files are moved into different folder groupings, then the checksum list or that folder will no longer be accurate. Since the checksums are stored in a text file, it’s possible to cut and paste them to reflect the new folder arrangement, but that won’t be particularly fun, easy or error-free.
We have identified and tested two programs that work with folder-stored checksums, one for Mac and one for PC. The movie in Figure 3 shows how this works on Mac
Figure 3 This video shows how the donationware program checkSum+ works on Macintosh.
Traditionally, this technique is for use with write-once media such as optical disc, but it’s possible to use for hard drive – as long as you treat that data as write-once data. That means you never update the files on the hard drive – no updated metadata, name changes, format migrations or any other updates.
Many people like to keep their backup files up-to-date with any metadata they may create for the primary copy of the files. It is essential that you don’t update the write-once copy, if you want the checksum to be meaningful. You may even want to lock the disk so that you don’t accidentally update any files on the disks. This will also keep any programs from updating the files in the background, although it is possible that a virus could get around this instruction.
Locking a disk or folder on Mac
Here are the steps to lock a Mac disk. You can also change permissions for a folder to Read Only, while leaving the disk as a whole unchanged.
- To lock a Macintosh disk, select the disk, and then select File>Get Info from the Finder menu.
- At the bottom of the menu, you’ll see “Sharing and Permissions”. Make sure that every user is set to “Read Only” as shown in Figure 4.
- In the gear pulldown menu, select “Apply to enclosed items.”
The disk should now not allow you to add or alter any data. If you want to add or alter data at a later date, you can reset the privileges to “Read and Write.” You may only want to turn Write on when you have more files to add to the disk.
Figure 4 This is what the Get Info panel looks like if you have set a disk or folder to be Read only.
Locking disks on PC
Here are the steps to set a Windows disk to read-only, as shown in Figure 5.
- Select an NTFS disk by right-click. Choose Properties.
- Select the Security tab.
- Select each “Group or user name”, starting at the top.
- Select the Edit tab.
- Put a checkbox in the Deny tab on the Write line. Hit Apply.
- Run through the next group until you have finished each.
Figure 5 To mark a drive as read-only, right-click and go to Properties, then go to the Security tab. Run through the User groups, changing the permissions to deny Write and hit Apply.
Locking folders on PC
You can also lock folders on a PC, but you find the controls in a slightly different place, as shown in Figure 6.
- Right-click on the folder you want to lock and select Properties.
- Click on the checkbox next to Read-only until a check appears.
- Select Apply. Another dialog will pop up, asking if you want this to apply to all subfolders and enclosed files. Click OK.
Figure 6 To lock a folder on PC, right click on the folder, and then click on the checkbox next to “Read-only” until a check appears. When you hit Apply, you will get the option to apply that setting to all included files.
Checksum+ is a folder-stored checksum program for Macintosh. It’s very simple to use, both to create the checksum and to verify it. Once you download and install the program, you simply drag a folder onto the program’s icon and it will ask you what kind of checksum you want to create, as shown in Figure 7. We suggest that md5 is a good balance of security and speed.
Figure 7 checkSum+ offers three checksum choices. We recommend md5.
Where does the checksum live?
Once it’s finished, a text file with the extension .md5 is created that is named for the parent folder, as shown in Figure 8. You’ll want to move this file inside the folder at this point, where it will be stored permanently. If you were to open the file in a text editor such as Textedit, you’d see something like what is shown in Figure 9.
Figure 8 checkSum+ creates a text document with the checksums and names it for the parent folder. We suggest you then place the checksum file inside the folder.
Figure 9 The checksum list created by checkSum+ has two elements. The first column shows the md5 checksum. The second column is the path to the file within the parent folder. If there were no subfolders, you’d only see the filename. In this case, there is a folder inside Images called Blog, and there are three images inside the Blog folder.
Verifying the checksum
The verification process is equally simple. Make sure that the checksum file is inside the parent folder, and drag the file to the checkSum+ icon. And missing files or checksum mismatches will be reported at the end of the comparison process, as shown in Figure 10.
- OK means that the checksums match.
- BAD means that the checksums don’t match.
- MISSING means that the file is no longer in the folder.
- NEW means that there is no corresponding checksum for the file in the .md5 file. ￼
Figure 10 checkSum+ provides a report of the status of all files that were part of the original hashing. Files will be shown as OK, Missing, Bad or New.
Corz Checksum is a folder-stored checksum program for Windows. It’s also very simple to use. Once you download and install the program, it shows up in the right-click menu, as shown in Figure 11. (There are a lot of options that first appear, but you can ignore all that and run it at the default settings.) Right-click on a parent folder and it will create checksums for each file in that folder and for all subfolders.
Figure 11 Once you install Corz Checksum, these options show up in the right-click menu. Choose “Create checksums” to start the process.
Where does the checksum live?
Once it’s finished, Corz Checksum drops a file inside each folder or subfolder that has any kind of file in it. The file is a text file named for the folder that it is inside, as shown in Figure 12. If you were to open the file in a text editor such as Notepad, you’d see something like what is shown in Figure 13.
Figure 12 Corz Checksum creates a text file for all files in this folder, and names that file for the folder. In this case, the folder is called Tasmania, so the checksum file is called Tasmania.hash
Figure 13 The checksum created by Corz Checksum has two main elements. The first column shows the md5 checksum. The second column shows the filename.
Verifying the checksum
The verification process is equally simple. Simply right-click on the parent folder (or any subfolder), and the program will run, as shown in Figure 14. Any missing files or checksum mismatches will be reported at the end of the comparison process, as shown in Figure 15.
Figure 14 Select any folder and right-click. Choose “Verify checksums” and Corz Checksum will verify checksums for all files in this folder, as well as any files in any subfolders inside this folder.
Figure 15 Corz Checksum provides a report of the status of all files that were part of the original hashing. Files will be shown as OK, Missing, Modified or New.
ImageVerifier runs on both Mac and Windows. It stores the checksum in a database, and therefore it is most practical for a computer setup where optical discs or write-once drives are always used with a single computer.
The movie in Figure 16 shows how to run ImageVerifier to store checksums and to verify them at a later date.
To store hashes
Here are the steps to store the checksums:
- Download and install the program.
- Create a job named for the disk or folder of images.
- Add the folder of images to the Job window.
- Make sure to check the “Store Hashes” checkbox.
- Run the job.
To verify checksums
Here are the steps to verify the files at a later date:
- Select the job that corresponds to the folder or disk.
- Highlight the existing files and hit the “–” button.
- Hit the “+” button and add the disk to the job window.
- Make sure to check the “Check hashes” box.
- Run the job.
- Confirm the results.
Figure 16 This video outlines the workflow for creating and verifying checksums with ImageVerifier.
It's likely that you'll come across some kind of checksum mismatch. At that point, you'll want to track down the corruption and correct it. Try opening the file identified as mismatching in your image editing application and see if it appear fine (there can be small mismatches that are not visible in the file). If the files don't open, you'll want to make a list and find these files in your primary storage or other backup storage and confirm that these copies still have integrity. If they are fine in other locations, it's time to burn new discs.
Pay attention to early warnings
If you find any data mismatches, you'll also want to check out any other disks that are made with the same brand of disk. You also might want to check any other disks that might have been burned in the same computer, since substandard disk burning can result in data decay over time.