Metadata is important for organizing, claiming ownership of, and even adjusting your pictures. You'll want to know how to manage metadata so the work you do to your images does not get lost.
Metadata can be incredibly useful in both the short and long term. Some metadata can be used to describe the image and how much you like it. Some can describe who owns the photo and how to contact the owner. And some metadata can be used to adjust the picture to your liking. For raw images that are finished with PIEware, the "only" thing that really changes about the file, from the time of capture onward, is the metadata that is associated with it. If you're going to construct a safe and effective workflow, you'll need to handle this metadata properly.
You'll need to get an understanding of where the metadata lives, and how to manage and preserve it. Imagine an image archive of hundreds of thousands of photos where the best images are tagged with high ratings, images are organized around subject matter, and they are adjusted to fulfill the vision of the photographer. Now imagine stripping all the metadata from this collection, and being left with original image data – back at square one. You would have just lost a very significant portion of the collection's value.
Fortunately, making use of and preserving metadata is not that difficult, once you understand where it lives, how to move it from place to place, how to back it up, and how to attach it to images. Let's look at these issues.
There are three basic places metadata can live: embedded in the file itself, in a sidecar file, or in a catalog. Each of these has advantages and disadvantages. Let's look at these three homes.
|Figure 1Metadata can live in one or more of these places: the file, a sidecar file or a catalog.|
File types with robust XMP support, such as TIFF, JPEG, PSD and DNG, can all contain virtually unlimited metadata. This information lives in the file itself, and can be carried from place to place as the file is transferred. Accordingly, the metadata is less likely to get separated from the file than other types of metadata. It has a couple drawbacks, however. The first is that the information about your images will be scattered throughout the collection, rather than gathered together for a comprehensive view of the collection. Additionally, embedded metadata is only available when the file is visible to the computer you're using. If the images are offline, then the information may not be available.
We consider embedded metadata to be an ideal vehicle to convey the metadata from place to place, but it's not quite as useful as the ultimate repository for this information.
Because proprietary raw files are built with custom structures, and because they are undocumented, they may have very limited capability to safely hold embedded metadata. As a result, many applications choose to store the information alongside the file in a text document – a sidecar file. This has the advantage of preserving the raw file, but it comes at a steep file management price. If the sidecar file is not treated properly, it can get separated from the main file and the metadata can be lost. And, like embedded metadata, it does not gather all the important information in one single place.
We consider sidecar metadata to be a temporary solution that is made necessary because proprietary raw files are not properly built to contain necessary metadata, nor are proprietary raw files standardized.
The most secure and efficient way to handle the metadata you create is by using catalog software. The catalog can harvest, manipulate, and save the metadata in a single document, even for many thousand files. We believe this is the best place to store the master copy of the metadata for your collection as a whole. Some of the advantages it provides:
- It can provide collection-wide searches
- It can enable off-line searching
- It facilitates the backup and restoration of this important and ever-changing set of data
- It provides a record of what's in the collection, which can be very important in the event of storage failure and restoration
We recommend that you employ a Master Catalog, or catalogs, as the ultimate storage location for your metadata.
In order to work with metadata-based programs safely and efficiently, you need to know which of the above locations your metadata is stored based on the program(s) you use.
- A browser, like Adobe Bridge, Photo Mechanic, or Google Picasa, will be working with embedded or sidecar metadata. It assumes that "the truth is in the file". When you add a keyword in any of these programs, the software will write that keyword into the file or sidecar file.
- A cataloging program, like Lightroom, Media Pro or IDimager, will primarily work with catalog metadata. When you add a keyword with these programs, you are generally adding a keyword to the catalog database only, rather than to the actual file itself: "The truth is in the database". These programs have the ability to synchronize all (or, in some cases, almost all) metadata back to the file or sidecar file.
Working with embedded metadata is pretty straightforward. When working with a browser, what you see is what you get. If the keyword is embedded in the file or sidecar file, it shows up in the browser software. But when you are working with catalog software, you see and make use of metadata that is not in the file itself, but, rather, is in the catalog. This can be confusing until you understand it. Let's look at the flow of metadata between image files and catalog software.
When you first put an image file "in" catalog software, the program harvests the existing metadata from the image file to store in the catalog database. Important: you are not putting the image file in the catalog. You are indexing the file with the catalog software. If there's a keyword in the image's embedded or sidecar metadata, the catalog should see that keyword and remember that it applies to this image. You'll see the counts go up in your keywords panel as the catalog finds images with a particular term. Likewise, thumbnails and previews are created during the indexing process, which is what the software shows you as you look through the collection.
Most catalog software also offers the ability to reharvest metadata in the event that it has been changed outside the catalog environment. This might happen because you used different software to work on the file, or maybe you just used a different catalog to work on the file.
After the catalog harvests the metadata for the files, the work you do to adjust this metadata is being done to the metadata that is in the catalog itself, not in the sidecar or embedded file associated with the image. So if you add a keyword, you'll be adding it to the catalog, not the file itself. There will be times you'll want to push this information back into the file or sidecar file. This is called Exporting, Updating or Synchronizing, depending on the program you are using. They all work in a similar way.
Select the images you want to update, and then go to the menu command to update the metadata. The hard drives on which the files are stored on will have to be connected, of course, in order to do the update. For a metadata sync, the process should be pretty quick, since it's only a small bit of data, relative to the size of the image archive. If you are also rebuilding DNG previews to reflect changes n PIE settings, the process may take quite some time.
|Figure 2 You can update metadata in the file with a command in catalog software. In Lightroom, for instance, select an image and go to the menu item "Save Metadata to File". If you select "Update DNG Preview and Metadata", the embedded preview of a DNG file will also be changed to reflect the current develop settings of the file. In Media Pro, you can choose to reimport metadata, or to export back to the original file. You also have control over the metadata that will be updated.|
Should the metadata be constantly synchronized?
Some programs, such a Lightroom, offer the user a preference to keep catalog metadata constantly up to date with embedded and sidecar metadata. For most situations, we don't suggest this in a workflow. Writing changes back to files or sidecar files with each change in metadata creates lots of data traffic, which can lead to errors. It can also slow the program down significantly. We suggest that you perform a sync because there is good reason. The following list contains some good reasons to sync:
- You want to move the image from one machine to another, and embedded metadata is the best way to carry all the information along with you.
- You are doing a migration from one piece of software to another, and you want the second program to see all work created by the first one.
- You are getting ready to archive image files, and you want the archive backups to contain the important information and settings you've created for the working files.
Should I synchronize to backup the metadata?
You'll notice that the bullet list above does not mention anything about syncing to backup your data. Shouldn't a photographer embed the catalog metadata in order to protect it? We suggest that this is not the best way to backup the data – and indeed it is often an incomplete and potentially hazardous way to create a backup. If you want to backup all the work you do to your images in a catalog, the best thing to backup is the catalog itself.
- Some programs don't write all information back to files. Lightroom, for instance, does not write collections, flags, virtual copies and develop history back to the file.
- In addition to containing all the data, backup of the catalog is considerably more efficient than backing up by syncing to the files. It's much easier to manage one catalog document than it is to manage changes written to tens of thousands of individual files.
- And there is an inherent danger to altering image files many times. Each time you touch a file, you introduce some risk of corruption. We suggest that the safest way to update embedded data is to do it in conjunction with some form of data validation. This argues for metadata syncing to be a much less frequent event.
Unfortunately, metadata can be lost, overwritten, stripped, or ignored. It can get lost in the transition from one program to another, from one file format to another, or in the creation of derivative files. The imaging industry is doing a lot of work to standardize and protect metadata, but it takes a while to make progress. And even when progress is made, existing files and software don’t necessarily support the new solutions. At the moment, if you want to preserve your metadata, you’ll want to check how it’s handled whenever you add new programs or practices to your workflow.
Watch out for strippers
The first thing you need is an awareness that metadata can be stripped, sometimes inadvertently. You’ll then want to check to see if any of your software regularly strips metadata. Older versions of Photoshop CS3 (10.0.0 and earlier), for instance, stripped metadata as part of the Save For Web process, without even asking users if they wanted to do so (this was changed in version 10.0.1.) If your metadata is important to you, it’s up to you to make sure that you don’t do anything to destroy it.
The most common time for metadata to be destroyed is when you make a new derivative file and the metadata is dropped in the conversion. Photoshop is generally very good about metadata preservation. Most DAM software does a good job with it as well. Most of the DAM software will give you some control over your metadata, offering you the ability to keep or strip it (or to strip parts of it).
Metadata can also get lost a few other ways. A program might simply not recognize metadata when it opens a file. Mac Preview, prior to OS X 10.5, for instance, does not see most IPTC data, including usage rights. And metadata can get lost if the program that creates it can’t attach it to the file. Extensis Portfolio, for instance, can attach metadata to TIFF and JPEG files, but can’t do so with raw or DNG files.
Test your software and workflow for metadata durability. You can use Bridge to examine the metadata inside conversion files to make sure that the programs and settings you’re using aren’t stripping metadata.