Digital Photo Archiving

Kate Bird
Graphics Librarian
Vancouver Sun & The Province

Creating Archiving Guidelines

The main goal in archiving photos is to correct errors or enhance caption information, keyword the content of the image, record the publishing information (paper, date & page ran) and do a final check of the image record so that it can be quickly and easily retrieved from the database in the future.

There is so much we would love to do with our image archives, but often time and lack of staff make us settle for what must or can be done. Below are the bare-bones decisions to be made.

Caption Information

The problem with image records is that their captions consist of only a few lines of text. Unlike news stories which have several hundred words which can be searched as well as other searchable fields and keywords, photo captions provide very limited access to an image and make finding images in the database difficult.. For this reason corrections, additional information in the caption field and good keywording are critical to enhancing image records and improving the quality of the database.

You have to decide how you want to handle the caption field in your archive. It is handled in a variety of ways by different papers. Some ways include:

a) the original caption (how it was transmitted) is left intact this means the caption field is never changed, corrected or added to. Any changes are recorded in a different field or as a note
b) the published caption may be added below the original caption or in another field. In our archive we do not add the published caption, relying on our text database if we ever need to know that information (very infrequent)
c) the corrected caption this is the original caption with corrections, additions, ID added, etc. by library staff (this is what we do in our database) IPTC header

 You must also decide how to handle the IPTC header in your archive. The IPTC header was originally intended to provide a standardized information set for each photo record. Unfortunately, time constaints and other factors have made the IPTC header a less powerful search tool it could be, and many fields are left blank.

At our papers, where our photographers shoot all photos with digital cameras, the IPTC fields they are required to fill out for each mage are kept to a minimum. This does not make good use of the IPTC fields but has proven to be more realistic. They MUST fill out the following fields:

  • Filename (slug)
  • Credit (name of paper)
  • Byline (photographer name)
  • Caption (they must provide a complete caption, which includes their credit line ie. Nick Didlick/Vancouver Sun at the end
  • City (filled out only when photographer is out of Vancouver)

    Keywording Pros & Cons

    There are plenty of arguments for keywording image records using a controlled list, or enhancing using free text terms in a separate terms, or doing neither of these enhancements to photo records. Some people feel that unless keywording is done to an industry-wide standard it is useless, and that keywording uses too much staff time and is highly subjective.

    In many newspapers, a controlled keyword list works well, as staff keyword the text database and are accustomed to using controlled terms for searching. If keywords are used well (just for relevant photos), searching with them can be very precise and ti§me efficient.

    Many news photos do not require in depth keywording (people photos, general news photos with good cutlines) but generic shots that can be reused in the future are good candidates for keywording.

    Some libraries add uncontrolled terms to photo records in a separate field, providing a range of terms to access images with. This means that time is saved looking for terms in a controlled list and the range of added words is much broader.

    Either way, I feel some enhancement of photo records in your digital photo archive is crucial, and should be determined by your own needs, library practices already in place, and staffing levels.

    Creating a Keyword List

  • I developed a keyword list when we first got Merlin using the keyword list we used to enhance our text database. I recently revised the original list based on a review of the past usage of keywords, suggestions for new keywords from staff, and my own additions from experience searching the Merlin database. I amalgamated terms, changed terms and added new ones
  • A keyword list should be developed to best serve your own uses. Our archive (and the resultant keyword list) are designed for editorial reuse and for research as well as for research and sales purposes by the libraryÕs selling arm, Infoline
  • Keyword lists are best when the basic list is developed but then revised periodically to reflect the need for new terms or the changes in existing terms that develop over time
  • The beauty of databases is that you can keep keyword terms flexible. If you decide a few years down the road that you find PROTESTS a better keyword than DEMONSTRATIONS, you can c hange the keyword on all the records previously keyworded
  • Below are sources to use when "g or revising your keyword list

    SLA News Division - Photo enhancement keywords

    http://sunsite.unc.edu/slanews/conferences/sla1998/photokeywords.html

    PACA (Picture Agency Council of America) - http://www.pacaoffice.org/

    Library of Congress Thesaurus for Graphic Materials

    Creating a Thesaurus of Keyword Terms

  • A thesaurus is simply a detailed version of the keyword list which provides scope notes, and defines terms by their relationships ie. broader, narrower and related terms
  • I have a thesaurus for the Pacific Press digital archive, which includes images from the Vancouver Sun and The Province. If you would like a copy, please contact me

    XTraining

    Indexing images is very different from indexing text. When text indexers switch to photo indexing they often use the caption information that accompanies the image and keyword to that information, rather than indexing the content of the image itself. Keywording to the caption often results in the duplication of a term already in the caption, or the addition of a keyword which has nothing to do with the content of the image.

    By keywording to the content of the image rather than the content of the caption, keywording adds concepts inherent in the image but not accessible by the caption information. Keywording to the image adds real value to the photo record by providing additional terms and concepts not in the caption.

    When keywording digital images we have to think more of the way we used to index general pictures rather than the way we indexed news stories for our test database. We have to think - would we have a picture file called this? An example is keywords which identify a person. Would we have a general photo file called Authors, or cross-file a people picture to it? I feel the caption field is the correct place to add the identification of a person (ie. author, musician, etc.) not by using keywords.

    Training materials could include the following documentation:

  • archiving guidelines
  • detailed archiving procedures
  • keyword list and keywording guidelines
  • thesaurus of keywords

    Common keywording problems are:

    1) Irrelevant Keywords

    Irrelevant keywords arise from keywording to the caption rather than to the content of the image.

    An example is the photo of Gillian Guess (attached). It has the keywords HEADSHOT, FASHIONS, and TRIALS. This is a people picture which should have simply been keyworded SINGLE. The photo is not of trial proceedings, taken in a courtroom etc. so the keyword TRIALS is irrelevant. If you searched by keyword under TRIALS, this image would not be helpful to your search. Also, the keyword FASHION is irrelevant as the FASHION keyword is meant for fashion shoots not for people simply wearing clothing.

    Also attached is a photo of Bill Reid. This record was keyworded with DEATHS. Although Bill Reid had recently died, there was nothing about death in the photo itself. It is simply a people picture, and should have been keyworded SINGLE. These errors make keyword searching more unreliable than it should be.

    Another example is a photo of children from Chernobyl at the PNE. The keyword NUCLEAR POWER refers to the Chernobyl concept in the caption, but this is not a photo which illustrates nuclear power.

    2) Not Using A Thesaurus

    Another problem is that indexes use the pop-up keyword list in Merlin rather than the printed thesaurus version of the keyword list, which enables you to find more precise keywords. In the Bill Reid example, instead of ART, this record should have been keyworded SCULPTURE. By using the thesaurus, the indexer would have found the narrower term.

    The thesaurus also has scope notes which define how we use a term. For the attached photo from the TV show Ally McBeal, the indexer has used TELEVISION but the keyword should be TV PROGRAMS. Television is used for the medium of television or television sets. Scope notes for the correct usage of terms are in the thesaurus.

    Beyond the Basics

    Depending on your database (Merlin, Preserver, etc.) there will be a variety of advanced features which go beyond the basics for your database.

    In Merlin, the tagged text for caption search feature provides a way to indicate (or tag) the name of the person who is actually in a photo, rather than someone mentioned in the caption but not pictured in the photo. Merlin also has a user defined thesaurus, so that you can define a metropolitan area as a "parent" term and the areas and cities that make up that metro area as the "children". In this way you can search FIRES and the "parent" term and it will retrieve photos of fires from all other defined "child" suburbs.

    Other systems have similar features and a number of user definable fields that may be useful to your library or newsrooms.