|
Documentation
IMLS DCC Collection Description
Metadata Application Profile:
Item Level Metadata Schemas
and Profiles:
Metadata crosswalks (in form
of XSLTs) in use:
Important Resources
We hope these descriptions,
web sites, and articles to help you learn more about metadata, registries,
and other ways to expose your collection to a wider audience. Click
a topic below.
About Digital Collections
What
is metadata?
General Resources
about Metadata
Examples of Metadata
Schemas
What
is a collection registry?
Examples of Collection
Registries
What is a
collection-level description metadata schema?
General Resources
about Collection-Level Description
Examples of Collection-Level
Description Metadata Schemas
What
is interoperability?
What
is shareable metadata?
What is the
Open Archives Initiative (OAI) Protocol for Metadata Harvesting
(PMH)?
What
are the options for implementing OAI metadata provider services?
General Resources
about the Open Archives Initiative (OAI)
Examples of OAI Service Providers
What is an
item-level repository?
About digital
collections (back to top)
A
Framework of Guidance for Building Good Digital Collections,
by Priscilla Caplan et al
A report from the Digital Library Forum, a group convened by
IMLS to discuss issues related to the management of networked
digital libraries. Much of the work of the IMLS DCC project
is based on recommendations from the Framework.
What
is metadata? (back to top)
Marketing
with Metadata, by M.Moffat
Metadata is information about a resource
(whether physical or digital). It can be descriptive information
(such as that found in a library catalog record), administrative
(such as that describing the donor of a museum item), or structural
(such as that describing the technical details of a digitized image).
How does metadata affect
my collection?
Metadata allows users to discover
resources that might be useful to them. It also enables a user to
decide whether to select a particular resource. For example, the
user can determine whether he or she has the technical capability
needed to view a .PDF file. Metadata also tells the user where to
find the resource, and it allows administrators of collections to
manage their resources.
There are many metadata schemas in
use by many different communities. Much effort has been made to
map between different schemas so that different collections can
be searched and used together. These mapping tools are called crosswalks.
In addition, there are many different controlled vocabularies, or
taxonomies. There are fewer crosswalks available for taxonomies,
although automatically mapping between them is being actively researched.
The metadata that describes the digital
resources in National Leadership Grant (NLG) collections is a key
component of this project.
We will create collection-level
description metadata schema (metadata describing a collection
as a whole rather than the individual items in it) for every NLG
project with digital content and store it in a collection registry.
In addition, we harvest the item-level
metadata in these collections using the Open
Archives Initiative Protocol for Metadata Harvesting. Then we
created a central repository of the aggregated metadata.
General Resources
About Metadata (back to
top)
Introduction
to Metadata, from the Getty Research Institute.
An excellent basic overview of
what metadata is, why it's important, and what it is used for.
A
primer on metadata, from the National Science Digital Library.
What to consider when creating
metadata for a large-scale digital library.
Some
observations on metadata and digital libraries, by Caroline
Arms.
An interesting paper given at
the Conference on Bibliographic Control in the New Millennium
in 2000 on the role of metadata within digital libraries.
Examples
of Metadata Schemas (back
to top)
Dublin
Core Metadata Initiative
Dublin Core is a simple and flexible
metadata standard. The DCMI home page provides current information
about the standard and specific applications for its use. Registered
Open Archives Initiative data providers are required to provide
metadata in at least Dublin Core for interoperability purposes.
Additional schemas may also be provided.
Western
States Dublin Core Metadata Best Practices
A part of a multi-state initiative
to create a virtual collection of widely dispersed digital resources,
representatives from several cultural heritage institutions
developed these best practices.
MARC
Standards, The Library of Congress
The standard used by libraries
to describe bibliographic and other resources.
Encoded
Archival Description (EAD)
The standard used to encode archival
finding aids.
What
is a collection registry? (back
to top)
A collection registry, as we envision
it, provides access, services, and additional functionality to a
database of collection descriptions.
Why provide access at the
collection level?
The registry will offer users access
to collections that are not easily found or that do not have accessible
catalogs. For example, Cornucopia,
a registry of museum collections in the United Kingdom, provides
access to many collections that otherwise are not easy to find.
Building a trusted registry of high-quality,
important collections helps to promote the collections' visibility
and improve access for all users. The National
Science Digital Library (NSDL) is a good example. The EnrichUK
is another example of a collection registry of digital collections.
One of our goals is to build a collection
registry of all National Leadership Grant (NLG) projects that include
digital content. To help us learn about what you think should be
included in the registry's collection-description schema,
we have distributed a survey in Fall 2003 to all NLG projects with
digital collections funded between 1998-2002. We are planning on
sending a similar survey to the NLG projects funded in 2003 in Spring
2004.
Examples of Collection Registries
(back to top)
National Science Digital
Library(NSDL)
The NSDL stores collection descriptions in a metadata
repository alongside item level metadata. While not every collection
within the NSDL will have item level metadata, every collection
must have a collection description. The collection description
is based on Qualified Dublin Core. GEM subject headings are
assigned in addition to whatever subject headings are submitted.
Collections can be submitted by contributors (see http://metamanagement.comm.nsdlib.org/collection_form.html)
or are created by NSDL staff.
Cornucopia: Discovering
UK Collections
Registry of museum collections maintained by Resource: The
Council for Museums, Archives, and Libraries in the United Kingdom.
The collection description is based on the RSLP CD schema.
iLumina:
Educational Resources for Science & Mathematics
Virtual ‘special’ collections are grouped by subject
matter. The collection description is based on RSLP CD schema.
Online Archive of California
Provides access to finding aids and collection guides describing
a wide range of materials such as manuscripts, photographs,
and works of art held in libraries, museums, archives, and other
institutions across California. The collection description schema
used is EAD.
EnrichUK
A gateway to the digital collections supported through the
New Opportunities Fund. The collection description is based
on the RSLP CD schema.
What
is a collection-description schema? (back
to top)
A collection-description schema is
simply a metadata schema that is designed to describe a collection
of resources rather than individual items. For example, a collection
description of the Wright Brothers Negatives, held by the Library
of Congress, looks like this:
There are only a few standard collection-description
schemas. In some cases metadata schemas that were originally developed
for individual resources have been adapted to describe a collection.
(MARC has been used in this way.)
The Encoded
Archival Description (EAD) was designed to encode archival finding
aids that describe library and museum collections. An EAD contains
both a top-level collection description as well as descriptions
of individual resources or groups of resources.
The Research
Support Libraries Programme (RSLP) in the UK developed a collection-description
schema to describe their collections in a consistent manner. Several
planned collection registries, including one for the digitized collections
emerging from the NOF-Digitise
initiative, will be using the RSLP schema as a basis.
General Resources
for Collection-Level Descriptions (back
to top)
A
special issue for collection-level description (D-Lib Magazine,
September 2000)
Digital
Collections, Digital Libraries, and the Digitization of Cultural
Heritage Information (Clifford Lynch) in First Monday, v.
7, no. 5, May 2002
An article based on the keynote
address given at the 2002 Web-Wise conference.
Geisler, Gary et al. 2002. “Creating
Virtual Collections in Digital Libraries: Benefits and Implementation
Issues.” Proceedings of the second ACM/IEEE-CS joint
conference on Digital Libraries. Pgs. 210-218.
Provides insight into the
use of collection descriptions in the iLumina
project and the Open
Video Project at the University of North Carolina at Chapel
Hill.
Hill, Linda L. et al. 1999. "Collection
Metadata Solutions for Digital Library Applications." Journal
of the American Society for Information Science. 50(13):1169-1181.
Describes the process of creating collection metadata for
a digital library.
Examples
of Description Schemas (back
to top)
RSLP
(Research Support Libraries Programme) Collection Description
Schema
This schema was developed to
enable RSLP projects to describe their collections in a consistent
and machine readable way. The schema was released in 2000.
Proposed
Dublin Core Collection Description
This proposed schema is based
upon the RSLP CD. For more information see the Collection
Description Working Group of the Dublin Core Metadata Initiative
(DCMI).
CIC
Collection Description Format
The CIC collection description format is created
out of the last proposal made by the Dublin Core Collection
working group for a Collection application profile available
to date. This proposal
has been made on August 20th 2004. The collection description
format is intended to provide a way for data providers to describe
the collections they expose. From those description, it will
be possible to extract contextual information to display for
end-user when they browse records on the CIC metadata portal.
Encoded
Archival Description (EAD)
This is the schema for encoding
archival finding aids. It has also been used to describe museum
collections (‘collection guides’) as inthe Museum
and Online Archives of California project.
What
is interoperability? (back
to top)
Interoperability has been defined
as the "ability of systems, services, and organizations to work
together seamlessly toward common or diverse goals." (from the Open
Archives Forum "OAI For Beginners Tutorial") In the
context of the IMLS DCC project, interoperability means the ability
to seamlessly share and access content from different digital collections.
Metadata standards facilitate interoperability by making it easier
to exchange metadata and use crosswalks.The Open Archives Initiative
Protocol for Metadata Harvesting enables interoperability by allowing
data providers to share metadata with service providers who then
build services around the metadata collected from multiple institutions.
How do you make a collection
interoperable?
- Build an infrastructure.
Essential for your ability to share metadata is an infrastructure
that supports it! Protocols like Z39.50 or the Open Archives Initiative
Protocol for Metadata Harvesting (OAI-PMH)
enable distributed searching of multiple collections. They make
it possible for a digital library or other service provider to
gather metadata from many collections and present it to the user
in one search interface.
- Create interoperable metadata.
Usually, you will be sharing the metadata for your collection,
not the content itself. But sharing your metadata isn't enough.
Because organizations use metadata in different ways, you need
to think carefully about your metadata. Does it make sense outside
of its native database? Does it include references that someone
in another location would understand? (See shareable
metadata below).
- Clarify rights.
You may have content that is only available to certain individuals
and groups. You can still share the metadata that describes this
content without opening up access to the items themselves. When
you share item-level metadata with IMLS DCC, you should include
a Rights Statement. That way, when users examine your metadata
records to learn about the content of your collection, they will
also learn that access to your content is restricted. For example,
here's the rights
statement for the collection of Wright Brothers negatives
at the Library of Congress.
What
is shareable metadata? (back
to top)
(The following is based
on the CIC-OAI
project recommendations for Dublin Core metadata providers)
1. Metadata is used for both information discovery
and display, so that it must contain information formatted for both
purposes.
2. Whenever possible, provide the native metadata schema via the
OAI Protocol.
3. For digital objects that are representations or surrogates of
physical objects, the descriptive metadata should describe the original
physical object. Descriptive metadata specific to the digital surrogate
should also be included and the URI of the digital surrogate is
placed in the dc:identifier element. The object type is not Physical
object. The Dublin Core 'one to one' rule does not apply here.
4. Do not merge elements when they are distinct in the original
metadata record (eg, subject 1 ; subject 2 ...).
5. Do not include empty elements or elements with no informational
value (such as not available or n/a). Note that the value 'unknown'
might have some informational value; use your judgement.
6. Repeat elements and element content as many times as needed
for adequate resource discovery (e.g., The same geographic string
may appear once in subject and once in coverage if you think this
is necessary).
7. To express more complex semantics within simple Dublin
Core elements, indicate refinements within the value. For example,
to indicate a collection that a resource is a part of: <dc:relation>Is
Part Of: Teaching with Digital Content</dc:relation>.
8. A single dc:identifier element shall contain a URI. This URI
points to the resource for display purpose, any other URI will have
to go to another element. For example, if the URI actually points
to the collections homepage, it can be recorded in the dc:publisher
or dc:relation element. Any other dc:identifier element that is
not a URI such as ISBN or whatever information is acceptable.
9. Make clear whether the end user will access a digital resource
or a description of a physical resource (or a finding aid). If there
is no existing digital material, make that clear, e.g., by not writing
any URI in the dc:identifier element (but rather in another element
such as relation or publisher) and/or writing a dc:type physicalobject,
possibly additional to any other dc:type.
10. Do not create multiple records pointing to the same URI (identifier).
11. If your resource is restricted access, this shall be mentioned
in the dc:rights (accessRights) element with the designation of
the categories of persons who are granted access (written for the
benefit of end-users).
12. Indicate the collection the item belongs to in the Relation
(isPartOf) element.
13. Whenever possible, use a controlled vocabulary or encoding scheme.
14. Whenever possible, name the controlled vocabulary particularly
for subject. In simple Dublin Core this is possible by adding the
controlled vocabulary to the value in brackets. For example, <dc:subject>United
States--Politics and government--1857-1861. [LCSH]</dc:subject>
15. Include the DCMI type in the dc:type element in addition to
any other more specific type (preferable from a controlled vocabulary
such as the LC Thesaurus of Graphic Materials II). For instance,
if the object described is a lithograph, you might include both
'lithograph' and 'image'.
16. Include the Internet Media Type encoding scheme in the dc:format
element in addition to any other formats (such as the physical dimensions
of the object). These are available at: http://www.isi.edu/in-notes/iana/assignments/media-types/media-type
. Please note that the first level (image for example)
can be used if an appropriate media type can't be found in this
list.
17. Include the ISO 639-2 encoding scheme for the dc:language element
where possible.
18. Use an appropriate standard encoding scheme for dates and for
the temporal aspect of the coverage element.
19. Don't use local jargon or language, or use it in addition to
controlled vocabulary.
20. Make sure your metadata meets the 'On the Horse' test. Take
a look at your metadata without the resource it describes and outside
of its website (copy to a word document for instance). Conduct a
usability test. Can the user tell you what this metadata describes?
What is the
Open Archives Initiative (OAI) Protocol for Metadata Harvesting
(PMH)? (back to top)
The Open Archives Initiative Protocol
for Metadata Harvesting (OAI-PMH) supports interoperability between
disparate collections of metadata.It achieves interoperability through
metadata harvesting rather than through distributed
searches as in the Z39.50 protocol.
In Z39.50, searches are requested
simultaneously across multiple metadata providers. Each provider
searches their own metadata and returns results to the search service,
which aggregates results from all responding providers.
In contrast, when OAI-PMH is used,
the metadata itself is completely or selectively harvested from
each metadata provider and aggregated in a central location. Searches
are then performed within this central repository. The Open
Archives Initiative is the organization responsible for the
metadata harvesting protocol used in the IMLS DCC project.
The OAI protocol is based on XML
and HTTP. A service provider sends an HTTP request to a data provider
using the protocol. The metadata provider—who has implemented
the OAI protocol—responds to the request by sending an XML
document through HTTP. In this way, the service provider can learn
who the metadata provider is (Identify request), what metadata
formats it supports (ListMetadataFormats request), and
how it has divided its metadata (ListSets request). The
service provider can also request the metadata itself (GetRecord,
ListIdentifiers, ListRecord requests).
The OAI protocol is metadata neutral.
It can be used with any metadata format. However, OAI-compliant
metadata providers (those who register with the Open Archives Initiative)
provide metadata in Dublin Core.
How is the OAI protocol going
to be used in the IMLS DCC project?
We will build a repository for metadata
describing the item-level content of digital collections created
through the IMLS National Leadership Grant program. We will harvest
metadata from these NLG projects using the OAI protocol. Our goal
is to provide assistance and tools to NLG projects to make it possible
for their metadata to be harvested using the protocol.
What
are the options for implementing OAI metadata provider services?
(back to top)
There are two options for participation
in the item-level repository:
Option 1: Become a full OAI data
provider.
Option 2: Become a static OAI data provider.
Option 1: Full OAI Data Provider
You can become an OAI metadata
provider whether you have a database or a file-based system. Becoming
a full data provider is the best option for projects:
• Actively adding metadata
to their collection
• With a large collection of metadata (over 5000 records)
Requirements for a database system:
• Metadata
• Database application
(e.g. MySQL, Oracle, MS Access, MS SQL)
• Web server with CGI capability
(e.g. Apache/Tomcat, MS IIS)
• Validating, transforming XML parser
(e.g. Xerces, Sun’s JavaXMLPack, MSXML)
Requirements for a file-based system:
• Metadata in XML or available
for IMLS DCC to put into XML
• Web server with CGI capability
(e.g. Apache/Tomcat, MS IIS)
• Validating, transforming XML parser
(e.g. Xerces, Sun’s JavaXMLPack, MSXML)
Option 2: Static OAI Data
Provider
To become a static data provider,
you will store your metadata records in a single, static XML file.
The XML file is then exposed for harvesting using a 3rd-party
gateway (which we will provide). This is the best option for projects:
• No longer adding metadata
to their collection
• With small collections (fewer than 5000 records)
Requirements to become a static
data provider:
• Metadata in XML. (IMLS
DCC will help with conversions.)
• Available space on a web server for posting static XML
files
General Resources About the
Open Archives Initiative (OAI) (back
to top)
Metadata
Harvesting and the Open Archives Initiative (Clifford Lynch)
ARL Bimonthly Report, no. 217, August 2001.
A good overview of the OAI and
its significance.
Frequently
Asked Questions about OAI-PMH
This Q&A covers what OAI-PMH
is and how you can become involved.
The
Open Archives Initiative Protocol for Metadata Harvesting
The technical framework for the
protocol.
OAI
for Beginners - the Open Archives Forum online tutorial
An excellent tutorial that covers
both the history and technical aspects of the protocol. Highly
recommended.
University
of Illinois Open Archives Initiative Metadata Harvesting Project
This was an effort to create
and implement a suite of OAI-based metadata harvesting services,
search services, and tools to facilitate discovery and retrieval
of cultural heritage resources. It serves as the basis for the
IMLS DCC project. Here you can find an OAI-PMH
tutorial (ppt) first presented at JCDL 2003.
Examples
of OAI Service Providers (back
to top)
Here are a few sites that provide
access to metadata harvested through the OAI protocol.
UIUC
Digital Gateway to Cultural Heritage Resources
The result of the University
of Illinois OAI Metadata Harvesting Project.
OAIster
The University of Michigan's
gateway to a collection of freely available, difficult-to-access,
academically-oriented digital resources that are easily searchable
by anyone.
Networked
Computer Science Technical Reference Library (NCSTRL)
A collaborative project involving
NASA Langley, Old Dominion University, the University of Virginia,
and Virginia Tech.
What is an
item level repository? (back
to top)
When we harvest metadata from the
NLG projects, we aggregate the data in one location, called a repository.
The repository acts as a portal to the item-level records for digital
content in NLG collections.
Why provide access to item-level
records?
The repository promotes the visibility
and usability of NLG collections. It works in concert with the collection
registry so that a user can more easily discover what types of resources
exist in these collections. For example, a user can first search
the registry for all NLG collections with content related to the
Civil War. Normally, in order to see the individual item records
for this content, the user would need to search each collection
individually. The IMLS DCC project makes it possible for the user
to retrieve item-level records for Civil War-related content—in
all NLG collections—with one search.
|