About digital collections(back to top)
A Framework of Guidance for Building Good Digital Collections, by Priscilla Caplan et al
A report from the Digital Library Forum, a group convened by IMLS to discuss issues related to the management of networked digital libraries. Much of the work of the IMLS DCC project is based on recommendations from the Framework.
What is metadata?(back to top)
Marketing with Metadata, by M.Moffat
Metadata is information about a resource (whether physical or digital). It can be descriptive information (such as that found in a library catalog record), administrative (such as that describing the donor of a museum item), or structural (such as that describing the technical details of a digitized image).
How does metadata affect my collection?
Metadata allows users to discover resources that might be useful to them. It also enables a user to decide whether to select a particular resource. For example, the user can determine whether he or she has the technical capability needed to view a .PDF file. Metadata also tells the user where to find the resource, and it allows administrators of collections to manage their resources.
There are many metadata schemas in use by many different communities. Much effort has been made to map between different schemas so that different collections can be searched and used together. These mapping tools are called crosswalks. In addition, there are many different controlled vocabularies, or taxonomies. There are fewer crosswalks available for taxonomies, although automatically mapping between them is being actively researched.
The metadata that describes the digital resources in National Leadership Grant (NLG) collections is a key component of this project.
We will create collection-level description metadata schema (metadata describing a collection as a whole rather than the individual items in it) for every NLG project with digital content and store it in a collection registry.
In addition, we harvest the item-level metadata in these collections using the Open Archives Initiative Protocol for Metadata Harvesting. Then we created a central repository of the aggregated metadata.
Introduction to Metadata, from the Getty Research Institute
An excellent basic overview of what metadata is, why it's important, and what it is used for.
Some observations on metadata and digital libraries, by Caroline Arms
An interesting paper given at the Conference on Bibliographic Control in the New Millennium in 2000 on the role of metadata within digital libraries.
Dublin Core is a simple and flexible metadata standard. The DCMI home page provides current information about the standard and specific applications for its use. Registered Open Archives Initiative data providers are required to provide metadata in at least Dublin Core for interoperability purposes. Additional schemas may also be provided.
A part of a multi-state initiative to create a virtual collection of widely dispersed digital resources, representatives from several cultural heritage institutions developed these best practices.
MARC Standards, The Library of Congress
The standard used by libraries to describe bibliographic and other resources.
The standard used to encode archival finding aids.
What is a collection registry?(back to top)
A collection registry, as we envision it, provides access, services, and additional functionality to a database of collection descriptions.
Why provide access at the collection level?
The registry will offer users access to collections that are not easily found or that do not have accessible catalogs. For example, Cornucopia, a registry of museum collections in the United Kingdom, provides access to many collections that otherwise are not easy to find.
Building a trusted registry of high-quality, important collections helps to promote the collections' visibility and improve access for all users. The National Science Digital Library (NSDL) is a good example.
The NSDL stores collection descriptions in a metadata repository alongside item level metadata. While not every collection within the NSDL will have item level metadata, every collection must have a collection description. The collection description is based on Qualified Dublin Core. GEM subject headings are assigned in addition to whatever subject headings are submitted. Collections can be submitted by contributors (see http://nsdl.org/recommend/collection) or are created by NSDL staff.
Registry of museum collections maintained by Resource: The Council for Museums, Archives, and Libraries in the United Kingdom. The collection description is based on the RSLP CD schema.
Provides access to finding aids and collection guides describing a wide range of materials such as manuscripts, photographs, and works of art held in libraries, museums, archives, and other institutions across California. The collection description schema used is EAD.
What is a collection-description schema?(back to top)
A collection-description schema is simply a metadata schema that is designed to describe a collection of resources rather than individual items. For example, a collection description of the Wright Brothers Negatives, held by the Library of Congress, looks like this:
There are only a few standard collection-description schemas. In some cases metadata schemas that were originally developed for individual resources have been adapted to describe a collection. (MARC has been used in this way.)
The Encoded Archival Description (EAD) was designed to encode archival finding aids that describe library and museum collections. An EAD contains both a top-level collection description as well as descriptions of individual resources or groups of resources.
The Research Support Libraries Programme (RSLP) in the UK developed a collection-description schema to describe their collections in a consistent manner. Several planned collection registries, including one for the digitized collections emerging from the NOF-Digitise initiative, will be using the RSLP schema as a basis.
A special issue for collection-level description (D-Lib Magazine, September 2000)
Digital Collections, Digital Libraries, and the Digitization of Cultural Heritage Information (Clifford Lynch) in First Monday, v. 7, no. 5, May 2002An article based on the keynote address given at the 2002 Web-Wise conference.
Geisler, Gary et al. 2002. "Creating Virtual Collections in Digital Libraries: Benefits and Implementation Issues." Proceedings of the second ACM/IEEE-CS joint conference on Digital Libraries. Pgs. 210-218.Provides insight into the use of collection descriptions in the iLumina project and the Open Video Project at the University of North Carolina at Chapel Hill.
Hill, Linda L. et al. 1999. "Collection Metadata Solutions for Digital Library Applications." Journal of the American Society for Information Science. 50(13):1169-1181.
Describes the process of creating collection metadata for a digital library.
This schema was developed to enable RSLP projects to describe their collections in a consistent and machine readable way. The schema was released in 2000.
This proposed schema is based upon the RSLP CD. For more information see the Collection Description Working Group of the Dublin Core Metadata Initiative (DCMI).
The CIC collection description format is created out of the last proposal made by the Dublin Core Collection working group for a Collection application profile available to date. This proposal has been made on August 20th 2004. The collection description format is intended to provide a way for data providers to describe the collections they expose. From those description, it will be possible to extract contextual information to display for end-user when they browse records on the CIC metadata portal.
This is the schema for encoding archival finding aids. It has also been used to describe museum collections (‘collection guides’) as in the Museum and Online Archives of California project.
What is interoperability?(back to top)
Interoperability has been defined as the "ability of systems, services, and organizations to work together seamlessly toward common or diverse goals." (from the Open Archives Forum "OAI For Beginners Tutorial") In the context of the IMLS DCC project, interoperability means the ability to seamlessly share and access content from different digital collections. Metadata standards facilitate interoperability by making it easier to exchange metadata and use crosswalks.The Open Archives Initiative Protocol for Metadata Harvesting enables interoperability by allowing data providers to share metadata with service providers who then build services around the metadata collected from multiple institutions.
How do you make a collection interoperable?
- Build an infrastructure.
Essential for your ability to share metadata is an infrastructure that supports it! Protocols like Z39.50 or the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) enable distributed searching of multiple collections. They make it possible for a digital library or other service provider to gather metadata from many collections and present it to the user in one search interface.
- Create interoperable metadata.
Usually, you will be sharing the metadata for your collection, not the content itself. But sharing your metadata isn't enough. Because organizations use metadata in different ways, you need to think carefully about your metadata. Does it make sense outside of its native database? Does it include references that someone in another location would understand? (See shareable metadata below).
- Clarify rights.
You may have content that is only available to certain individuals and groups. You can still share the metadata that describes this content without opening up access to the items themselves. When you share item-level metadata with IMLS DCC, you should include a Rights Statement. That way, when users examine your metadata records to learn about the content of your collection, they will also learn that access to your content is restricted. For example, here's the rights statement for the collection of Wright Brothers negatives at the Library of Congress.
What is shareable metadata?(back to top)
(The following is based on the CIC-OAI project recommendations for Dublin Core metadata providers)
1. Metadata is used for both information discovery
and display, so that it must contain information formatted for both
2. Whenever possible, provide the native metadata schema via the OAI Protocol.
3. For digital objects that are representations or surrogates of physical objects, the descriptive metadata should describe the original physical object. Descriptive metadata specific to the digital surrogate should also be included and the URI of the digital surrogate is placed in the dc:identifier element. The object type is not Physical object. The Dublin Core 'one to one' rule does not apply here.
4. Do not merge elements when they are distinct in the original metadata record (eg, subject 1 ; subject 2 ...).
5. Do not include empty elements or elements with no informational value (such as not available or n/a). Note that the value 'unknown' might have some informational value; use your judgement.
6. Repeat elements and element content as many times as needed for adequate resource discovery (e.g., The same geographic string may appear once in subject and once in coverage if you think this is necessary).
7. To express more complex semantics within simple Dublin Core elements, indicate refinements within the value. For example, to indicate a collection that a resource is a part of: <dc:relation>Is Part Of: Teaching with Digital Content</dc:relation>.
8. A single dc:identifier element shall contain a URI. This URI points to the resource for display purpose, any other URI will have to go to another element. For example, if the URI actually points to the collections homepage, it can be recorded in the dc:publisher or dc:relation element. Any other dc:identifier element that is not a URI such as ISBN or whatever information is acceptable.
9. Make clear whether the end user will access a digital resource or a description of a physical resource (or a finding aid). If there is no existing digital material, make that clear, e.g., by not writing any URI in the dc:identifier element (but rather in another element such as relation or publisher) and/or writing a dc:type physicalobject, possibly additional to any other dc:type.
10. Do not create multiple records pointing to the same URI (identifier).
11. If your resource is restricted access, this shall be mentioned in the dc:rights (accessRights) element with the designation of the categories of persons who are granted access (written for the benefit of end-users).
12. Indicate the collection the item belongs to in the Relation (isPartOf) element.
13. Whenever possible, use a controlled vocabulary or encoding scheme.
14. Whenever possible, name the controlled vocabulary particularly for subject. In simple Dublin Core this is possible by adding the controlled vocabulary to the value in brackets. For example, <dc:subject>United States--Politics and government--1857-1861. [LCSH]</dc:subject>
15. Include the DCMI type in the dc:type element in addition to any other more specific type (preferable from a controlled vocabulary such as the LC Thesaurus of Graphic Materials II). For instance, if the object described is a lithograph, you might include both 'lithograph' and 'image'.
16. Include the Internet Media Type encoding scheme in the dc:format element in addition to any other formats (such as the physical dimensions of the object). These are available at: http://www.iana.org/assignments/media-types/index.html . Please note that the first level (image for example) can be used if an appropriate media type can't be found in this list.
17. Include the ISO 639-2 encoding scheme for the dc:language element where possible.
18. Use an appropriate standard encoding scheme for dates and for the temporal aspect of the coverage element.
19. Don't use local jargon or language, or use it in addition to controlled vocabulary.
20. Make sure your metadata meets the 'On the Horse' test. Take a look at your metadata without the resource it describes and outside of its website (copy to a word document for instance). Conduct a usability test. Can the user tell you what this metadata describes?
What is the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH)?(back to top)
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) supports interoperability between disparate collections of metadata.It achieves interoperability through metadata harvesting rather than through distributed searches as in the Z39.50 protocol.
In Z39.50, searches are requested simultaneously across multiple metadata providers. Each provider searches their own metadata and returns results to the search service, which aggregates results from all responding providers.
In contrast, when OAI-PMH is used, the metadata itself is completely or selectively harvested from each metadata provider and aggregated in a central location. Searches are then performed within this central repository. The Open Archives Initiative is the organization responsible for the metadata harvesting protocol used in the IMLS DCC project.
The OAI protocol is based on XML and HTTP. A service provider sends an HTTP request to a data provider using the protocol. The metadata provider—who has implemented the OAI protocol—responds to the request by sending an XML document through HTTP. In this way, the service provider can learn who the metadata provider is (Identify request), what metadata formats it supports (ListMetadataFormats request), and how it has divided its metadata (ListSets request). The service provider can also request the metadata itself (GetRecord, ListIdentifiers, ListRecord requests).
The OAI protocol is metadata neutral. It can be used with any metadata format. However, OAI-compliant metadata providers (those who register with the Open Archives Initiative) provide metadata in Dublin Core.
How is the OAI protocol going to be used in the IMLS DCC project?
We will build a repository for metadata describing the item-level content of digital collections created through the IMLS National Leadership Grant program. We will harvest metadata from these NLG projects using the OAI protocol. Our goal is to provide assistance and tools to NLG projects to make it possible for their metadata to be harvested using the protocol.
There are two options for participation in the item-level repository:
Option 1: Become a full OAI data provider.
Option 2: Become a static OAI data provider.
Option 1: Full OAI Data Provider
You can become an OAI metadata provider whether you have a database or a file-based system. Becoming a full data provider is the best option for projects:
• Actively adding metadata to their collection
• With a large collection of metadata (over 5000 records)
Requirements for a database system:
• Database application
(e.g. MySQL, Oracle, MS Access, MS SQL)
• Web server with CGI capability
(e.g. Apache/Tomcat, MS IIS)
• Validating, transforming XML parser
(e.g. Xerces, Sun’s JavaXMLPack, MSXML)
Requirements for a file-based system:
• Metadata in XML or available for IMLS DCC to put into XML
• Web server with CGI capability
(e.g. Apache/Tomcat, MS IIS)
• Validating, transforming XML parser
(e.g. Xerces, Sun’s JavaXMLPack, MSXML)
Option 2: Static OAI Data Provider
To become a static data provider, you will store your metadata records in a single, static XML file. The XML file is then exposed for harvesting using a 3rd-party gateway (which we will provide). This is the best option for projects:
• No longer adding metadata to their collection
• With small collections (fewer than 5000 records)
Requirements to become a static data provider:
• Metadata in XML. (IMLS DCC will help with conversions.)
• Available space on a web server for posting static XML files
Metadata Harvesting and the Open Archives Initiative (Clifford Lynch) ARL Bimonthly Report, no. 217, August 2001.
A good overview of the OAI and its significance.
This Q&A covers what OAI-PMH is and how you can become involved.
The technical framework for the protocol.
An excellent tutorial that covers both the history and technical aspects of the protocol. Highly recommended.
This was an effort to create and implement a suite of OAI-based metadata harvesting services, search services, and tools to facilitate discovery and retrieval of cultural heritage resources. It serves as the basis for the IMLS DCC project. Here you can find an OAI-PMH tutorial (ppt) first presented at JCDL 2003.
Here are a few sites that provide access to metadata harvested through the OAI protocol:
OCLC's gateway to a collection of freely available, difficult-to-access, academically-oriented digital resources that are easily searchable by anyone.
A collaborative project involving NASA Langley, Old Dominion University, the University of Virginia, and Virginia Tech.
What is an item level repository?(back to top)
When we harvest metadata from the NLG projects, we aggregate the data in one location, called a repository. The repository acts as a portal to the item-level records for digital content in NLG collections.
Why provide access to item-level records?
The repository promotes the visibility and usability of NLG collections. It works in concert with the collection registry so that a user can more easily discover what types of resources exist in these collections. For example, a user can first search the registry for all NLG collections with content related to the Civil War. Normally, in order to see the individual item records for this content, the user would need to search each collection individually. The IMLS DCC project makes it possible for the user to retrieve item-level records for Civil War-related content—in all NLG collections—with one search.