How to Archive your Digital Photos

In this section, we take a look at the principles that are involved with best practices in digital photography archiving. The components of a digital photo archive are: Transfer Storage Media Validation Migration.

Archive overview
Archive vs. Backup
When do backup files need to move to the archive?
Store archives and backups separately
Data integrity and file transfer
Verified data transfer utilities
Optical media
Searching the archive
Organize by cataloging software or by computer directory structure?
Migrating the archive

Archive overview

We believe it is important to deal with creating your archive upfront, even before you do any work. The reality is that most of us already have a collection of image files which have been stored in a method that served the immediate need with little thought given to a long term plan. “I was planning on doing that when I have time…” you said? Make the time now to create a system. Not everyone has developed a truly thought out plan for keeping track of their work. Hence, when a client needs to revisit a project or we want to pull from existing materials, or add to an ongoing body of work, enormous amounts of time (usually with some help from our buddy, luck) are needed to find exactly what we need. dpBestflow wants to clarify the difference between back-up and archive and give you strategies to build a plan that will work for you, make things more efficient and protect all of your work and image files.

Archive vs. backup

There is understandably some confusion between the terms archive and backup. Using the term archive in conjunction with any digital information can be misleading because archive presumes that something is stored permanently. Currently, no digital media is archival. The most we can hope for is that digital storage is sustainable until we can migrate digital files to a truly archival media.

  • Archives are the primary copy.
  • Backups are secondary copies of the same data.

It is the unchanging aspect of this kind of image storage that defines an archive. That’s not to say that you will never revisit these files. Depending on your photographic niche, revisiting your archives may happen rarely, or often, particularly if you shoot for stock, art, or personal projects. On the other hand, some commercial photographers need archives primarily because many of their clients do not have robust DAM systems and are apt to lose delivery files.

Archive

Simply, an archive is a collection of images kept in secure storage. There are different kinds of archives that occur at different stages of the workflow:

  • Archive of original capture files
  • Archive of the master files which contain image optimization
  • Archive of derivative files
  • Final archive contains all of the above

Backup

A copy of digital image files whose purpose is to restore the original files in the event of a data loss event. We cannot emphasize enough the importance of backing up digital image data throughout the entire workflow process. Think of backups as having an insurance policy, something just in case the unthinkable happens. Eventually backups become archives. Archives themselves need to be backed up.

A backup is useful for:

  • Disaster recovery due to media failure
  • Restoring small numbers of files that have been accidentally deleted or corrupted.
  • Preventing data loss. Data loss is very common. Nearly half of all computer users have lost digital files or experienced data corruption.
  • Primary working file storage and its back up

Backups are usually:

  • Temporary storage
  • Created instantly in the case of mirrored RAID
  • Created on regular or semi-regular intervals either manually or by scheduling software
  • Protect your working files while they are active

Read more about backup in the Backup Section

Backups eventually become your archive, which also needs to be backed up.

archive backup
 

When do backup files need to move to the archive?

Image files have different life cycles depending on the photographer’s workflow and whether image files are camera originals, derivatives of the originals, or delivery files. Camera original JPEG or TIFF files should be archived as soon as they are beyond the editing stage. Since they are written in a standard file format, they need to be protected from the possibility of being overwritten. Camera original raw files can be archived immediately after ingestion, especially if you do batch renaming and add bulk metadata during ingestion. If that is not your workflow, and you prefer to batch rename and continue to add custom metadata during the edit, you may choose to archive after this work is finished. We suggest that it is most useful and efficient to archive after the first round of optimization. This ensures that PIE data is contained in the archived files. Consequently, the camera original raw files (or DNG) are backed up and archived after they have been edited, renamed and bulk metadata has been applied along with PIE adjustments.

As projects progress, additional files will need to be added to the archive, especially in an Optimized Workflow. The additional files can be:

  • Masterfiles
  • Derivative files prepared for proofing, printing and delivery

Store archives and backups separately

a to b
 

It is important to organize archives so the act of adding and retrieving images does not affect the integrity of the stored data. The first rule is that both backups and archives should be kept separately from the workstation main hard drive. This is the drive that has the application and system files. Hard drives or other media, such as CD/DVD/Blu-Ray, or more rarely tape can be used. Hard drives can be either internal but separate, or external drives attached by USB, Firewire, or eSata cables. The use of RAID devices can be considered, although RAID systems (except for mirrored RAID) require their own backups since they are fault-tolerant and not capable of backing themselves up completely.

Data integrity and file transfer

Data integrity is, of course, a core requirement for digital image archives. An important means of preserving data integrity is to use a backup utility to transfer digital image files from the working file drive to the archive.

verify
 

Use of the Finder application (Mac) or Explorer (Windows) to copy files from the working drive to backup drives and archive drives is potentially unreliable and there is no verification function. We have found that while using the operating system to copy single folders and smaller sets of data is usually reliable, transferring large amounts of data arranged in nested folders is not failsafe. Even the slightest problem with a cable, connection, or power can result in some files not being completely transferred. The operating system's file transfer application can crash and relaunch undetected during lengthy transfers. Just one corrupt file in a batch of files you are transferring can cause a silent Finder crash, endangering the other files in the process.

Verified data transfer utilities

Use of utilities such as Synchronize! Pro X, or Chronosync for the Mac, SyncBack Pro, or Acronis for Windows computers, allows verified transfer of image data as well as incremental backups, which are backups that only add new or changed data to the archive. Incremental backups protect against random file corruption by keeping the amount of changed data in an archive to a minimum. Incremental backups are not only safer, but are also more efficient since they build on the previously stored data by only adding new or changed data to the archive.

This short movie shows what a validated transfer looks like.

 

Optical media

Optical media software has built in data verification and we recommend that you always verify burned discs. Discs that have been in storage can be verified against the original data using the burn software, but only if the data has not been changed, or the original burn data or “project” was kept. Another method of verifying image data on stored optical media is to verify the image data using software such as Image Verifier and then store the Image Verifier hash folder with the image data when you burn the disc. Now you have a method for verifying the image data with Image Verifier software using the stored hashes.

Searching the archive

The value of an archive and its images is dependent on how well the archive is organized and the amount and quality of information contained in the image files.

Images that contain informational metadata, such as descriptive categories, keywords and IPTC parameters such as date, time, location, and subject are easier to find than images without this information. Images that contain ratings provide more efficiency than those without. Ratings in your archive makes it easier to find your most important images and consequently devote more of your time focused on optimization. Images that contain PIE instruction sets are quicker and easier to process into derivative files than images that do not contain this data.

Organize by cataloging software or by computer directory structure?

When organized with cataloging software, images in archives are easier to find than images organized only with a computer directory structure of file folders, or images stored on CD or DVD kept on shelves or in binders. When organizing an image archive by means of file folders, image files can only reside in one folder. This makes locating images a daunting folder-by-folder task especially if files are located across hard drives or CD/DVDs. Duplicated image files can reside in multiple folders creating several logistical problems such as difficulty keeping track of versions as well as increased file storage overhead.
Read more about directory structure

Migrating the archive

Not much is known about digital longevity on any particular media. What is certain is that digital data deteriorates no matter what media it is stored on. Digital media may also deteriorate and/or fail. For optical media, use the highest quality write-once media since these use more stable dyes to hold the data. High quality optical media also has better edge sealing which guards against invasion of dye eating microbes. Keeping the optical media in jewel cases, or in CD notebooks, ideally ones with archival pages, guards against the three nemeses of optical storage, exposure to sunlight, heat, and scratches. To maximize optical media longevity:

  • Never use glued-on labels.
  • Never write on the discs with solvent based markers. The glue and ink can eat into the dye layer.
  • Invest in archival pages or storage boxes.

dpBestflow recommends establishing a migration plan for your archive. Rather than worry about whether your CD/DVD will last longer than the 2-5 year estimates, plan to migrate the data to more current, stable, and potentially larger optical media. Blu-Ray may become a viable option when it becomes cost effective. Current single layer Blu-ray discs can replace five DVD discs or over thirty CD’s. Blu-ray has the potential to hold even more data, depending on how many layers are supported. We can’t be certain that Blu-ray discs will last longer than CD/DVD. The Blu-ray manufacturers claim 30- 300 years, however the technology is too new for a definitive answer. However, Blu-ray discs are much more scratch resistant since they have a tough polymer coating, called Durabis. Surface scratches are responsible for many of the failures of CD/DVD discs, as any Netflix customer is well aware.

Hard drives have a useful life span of about five years. The accumulated risk of failure and small size relative to newer, current technology drives make data migration prudent. The rapid increase in drive size has necessitated that the migration of data be done for a more efficient consolidation of the archive before there is a danger of the hard drive failing. The cost per GB of hard drive storage has been cut roughly in half with each advancing year. Currently, the average cost is around 10 cents/GB in 2009. As your archive grows, migrating the image data to larger capacity hard drives can protect against data deterioration and make managing the archive easier. It is easier to maintain one Terabyte drive than a small army of ten 120 GB drives.

Media and file format obsolescence adds to the problem of physical decay or failure of archive and backup media. The only solution is to continually move the data to newer media. Periodically, the image data itself may need to be converted to new formats. Although current storage media may not seem to be in imminent danger of obsolescence, just consider this list of media that now clogging landfills, storage trunks in basements and attics: Bernoulli, Jaz, Clik, Sparq, SyJet, floppy disks of all sizes, 12” Optical discs, among others. Migrating your archive will ensure that you don’t leave viable image data trapped in unreadable media.

Although media obsolescence is a more likely threat than file format obsolescence, there are many examples of data locked in obsolete formats that are unreadable. This is especially true of proprietary formats. Camera makers have already orphaned some proprietary camera raw formats, and we are only a few years into the process. Consider that most raw formats are rewritten with every new camera launch, and the fact that they are undocumented.

While converting raw image data to JPEG or TIFF is one strategy for avoiding image format obsolescence, the lossy nature of JPEG and the large size and fixed nature of TIFF are problematic. Converting to a standard raw format is a better choice for image archives. Currently, Adobe DNG format is the only candidate. Still, even DNG files may need to be migrated to a subsequent DNG version or a replacement format as yet unknown. The important feature of the current DNG specification is that all data is preserved. Even data that is not understood or used by Adobe or third party software is preserved. Although it is too early to tell how successful it will be, the Phase One EIP format may offer another path. It uses the open ZIP format to wrap up the raw image data with processing instructions and any applicable lens cast correction data. Whatever standard raw solution develops, keeping the raw capture files, proprietary or otherwise, separate from masterfiles and derivative files as mentioned earlier, will make format migration easier to automate.

Up to main File Lifecycle page
Back to Ingestion