The Digital Negative (DNG) file format offers some unique capabilities for data validation that are not found in any other formats. This page explains those features and how to use them.
The DNG file format is the only openly documented file format specially designed for the challenges of building a digital photography collection. One of DNG's most attractive capabilities, from an archival standpoint, is the ability to embed validation codes in the file itself, so file integrity can be easily confirmed, even for large numbers of files.
Unlike TIFF, JPEG, or PSD, a DNG file contains a source image that is never supposed to be altered. Any work to adjust a DNG is done by means of instructions. The changes are applied in real time as the DNG is opened in the image editing software. This large, unchanging part of the DNG enables a special kind of data validation. As of DNG specification 1.2, an MD-5 validation hash can be embedded in the file itself. This hash refers only to the unchanging source image file so it will remain useful, even after the file is readjusted or additional metadata is added.
There is a second validation hash that is used if the original source file is embedded in the DNG. This makes the DNG file the safest place to keep a proprietary raw file because you can know with certainty if even one bit has changed in the file.
Figure 1 The DNG contains a validation checksum that can let you know if even one bit has changed in the image data in the file. This is not supposed to happen, and is indication of some kind of problem with your computer system.
Because the data validation hash lives in the file itself, and because it remains valid even after the file has been re-worked, DNG is most easily validated file type. You can check the health of a set of files and know with nearly absolute certainty if the files remain in good condition. All that's necessary is for the hash to be recomputed and checked against the one that is embedded in the file. Any mismatch indicates a problem with storage or transfer.
With file types other than DNG, hard-core data validation requires a complex set of checks of each part of the system. For DNGs, all you need to do is check for completeness and then do a hash-check.
Canaries were used in coal mines as early warnings of problems with the air supply. The DNG validation hash offers not just a verification of the files themselves, but it also verifies that they have been handled properly by each part of your system. You don't have to check RAM, disks, enclosures, or networks; just make sure that the files are valid and you'll know that they have been handled properly at each step.
Conversely, if there is some problem with the files, you know there is an issue with some part of your computer system, and you'll need to track it down. In this way, the absolute validation provided by the DNG checksum gives you an early warning that a problem exists.
At the moment, the easiest way to validate DNG files is to send them back through the DNG converter and check the log for error messages. You can send tens of thousands of DNG files to the converter at once, and let it chug through the validation for however long it takes (sometimes days). Of course, you'll need enough "landing space" for the new DNG files without filling up the destination drive. Every few of hours during the process, you can erase these files as they are created to keep the volume trimmed down. Figure 2 shows this process in action.
|Figure 2 This video shows how to use Adobe DNG Converter to validate a collection of DNG files.|
The latest versions of Adobe Lightroom and Adobe Camera Raw will also perform a validation of DNG files, but neither of these programs makes it easy to validate a large number of files at once. Lightroom will only validate one image at a time in the Develop module. Camera Raw will validate all images that are loaded into Camera Raw, but this gets cumbersome to accomplish with more than one folder at a time.
If Lightroom finds a checksum mismatch, it will present the warning shown in Figure 3.
|Figure 3 shows the warning Lightroom will display when it finds a hash mismatch.|
|Figure 4 If you open a file in camera Raw and the file data does not match the embedded checksum, you will see this warning. You can still open the file and check to see if the error is visible.
Adobe has made the code for validating DNGs available for free as a part of their SDK (Software Development Kit). This means that any application that works with DNG files can implement the validation without spending a lot of time on engineering. We should see a number of applications taking advantage of this code in the near future.