IMLS Digital Collections and Content IMLS DCC Project tagline: Working toward Interoperable Digital Content

David Carr quotation
Vertical Rule

 

Grant LG-02-02-0281
Interim Performance Report 2
1 April – 30 September 2003
Submitted by Timothy W. Cole, Principal Investigator, October 2003

(back to About the Project)

Table of Contents:

Summary
General Project Activities
Collection Registry Metadata Schema and Service
Item-Level Metadata Repository
Research
Appendix 1 - IMLS DCC Revised Schedule of Completion (pdf file)

Summary ( back to top)

The IMLS Digital Collections and Content project has made good progress in all areas of the project during the previous six months. As noted in prior report, a few early milestones were delayed due to longer than anticipated time to obtain clearance for survey instruments, but surveys were distributed to 92 National Leadership Grant (NLG) projects in mid-September. We anticipate completing all milestones, albeit on a delayed schedule. In August, a collection level metadata schema was approved by the Steering Committee in and submitted to IMLS for approval. Progress was made on design of the collection registry interface and database structure. The project team worked with several institutions to either set up or advise on implementation of Open Archives Initiative (OAI) metadata provider services. As of October 23, 2003, the item level metadata repository holds 43,462 records from fifteen NLG projects. Repository is currently searchable through an early alpha interface. Project research team has begun interviewing participants from selected projects, and continues to host a bi-weekly metadata roundtable.

General Project Activities (back to top)

Financial Report

The Annual Financial Report and a narrative was supplied to IMLS.

Timeline

The original timeline for the project has been substantially changed due to the delay in distributing the survey instrument. Included in Appendix One is a new schedule of completion. New dates for milestones are noted throughout this Interim Performance Report.

Dissemination

The IMLS DCC project has been presented in a number of forums:

The IMLS DCC project had two posters at the 2004 Dublin Core (DC) Conference in Seattle, WA on September 29-October 2, 2003.

Knutson, E., Palmer, C. & Twidale, M. (2003). Tracking Metadata Use for Digital Collections [Poster Abstract]. In DC-2003: Proceedings of the International DCMI Metadata Conference and Workshop p. 243-244.
Shreeves, S.L. & Cole, T.W. (2003). Developing a Collection Registry for IMLS NLG Digital Collections [Poster Abstract]. In DC-2003: Proceedings of the International DCMI Metadata Conference and Workshop p. 241-242.

Tim Cole presented “IMLS NLG Collection Registry & Item Level Metadata Repository at the University of Illinois” and “Notes on Panel on Future of OAI” at the 4th Open Archives Forum Workshop in Bath, UK on September 4, 2003.

Sarah Shreeves presented “Integrating Resources for Information Discovery” at the Digital Resources for Cultural Heritage: Current Status, Future Needs. A Strategic Assessment Workshop in Washington, D.C. August 25, 2003.

In March 2004 we are planning to hold a workshop on the Open Archives Initiative at WebWise in Chicago, IL as well as present on our work thus far. We are actively considering other near-term dissemination venues including the Joint Conference on Digital Libraries 2004 and the American Society for Information and Technology Annual Conference 2004.

Steering committee activity

The Steering Committee met via conference call on July 29, 2003 to discuss the proposed collection description metadata schema. The Steering Committee web site (password protected) has been updated regularly with relevant documents produced by the IMLS DCC project. The next meeting of the Steering Committee is tentatively planned for March 5th, 2004 after the WebWise Conference in Chicago, IL.

Collection Registry Metadata Schema and Service (back to top)

Survey of IMLS NLG projects

In early September, the Office of Management and Budget approved our survey of and plan for follow-up emails with relevant IMLS NLG projects. On September 15th a packet with two survey instruments - the first collecting and verifying project and collection information (for initial registry entries) and the second supporting our research investigations - were sent to the principal investigators (PIs) of 92 NLG projects. We created a SQL database to record the results of the surveys as they are returned.

During October and November 2003, we will contact non-respondents first by email and then by phone and will continue to enter survey data into our database. The survey results will allow us to create preliminary records for the collection registry, categorize NLG projects according to their viability for implementing OAI data provider services, and provide information for our research.

Developing the collection-description metadata schema

Much of the project’s work in the last six months was concentrated on further developing the collection-description metadata schema. Our work was informed by our participation in ongoing discussions on the Dublin Core Collection Description Working Group listserv and meeting at DC-2003 and by the conversations at the Metadata Roundtable, a bi-weekly meeting of faculty and students interested in metadata issues held at the Graduate School of Library and Information Science. We captured much of our process in creating the schema in our poster (ppt file) and poster abstract for the DC-2003 conference. In June 2003 three NLG projects tested the IMLS DCC collection description metadata schema. There were no unexpected findings from this test run; the participants seemed to understand the schema. After examination and discussion of the CIDOC Conceptual Reference Model (a top-level ontology and proposed ISO standard for the semantic integration of cultural information) we made further adjustments to the schema including elements identifying the physical collection(s) from which the digital collection(s) was derived.

In July 2003 we presented the revised schema to the IMLS DCC Steering Committee for their approval. We convened a meeting via conference call of steering committee members on July 29, 2003 to vet the schema. The Steering Committee agreed in general that the July 2003 revision of the collection description schema was appropriate and was not missing any major elements. Some minor changes, deletions, and additions were made. A final copy of the metadata schema was circulated to the Steering Committee in mid-August for final review and then was submitted to IMLS for approval on August 20, 2003.

Designing and building the collection registry

A preliminary version of the database for the collection registry was built in August 2003 and was tested using the collection descriptions submitted by our testers. We developed a preliminary ‘staff’ interface to aid in navigating the relationships between collections, projects. We shared this interface with IMLS on September 3, 2003. In addition we began designing a public interface to the collection (again based on the three test records). We examined other collection registries such as Cornucopia (http://www.cornucopia.org.uk) and Enrich UK (http://www.enrichuk.org) for functionality and interface design features. We built a browse screen based on the GEM subject headings, a short display, and a full display. This alpha mock-up was shared with IMLS on September 10, 2003.

During the next six month period we will build preliminary collection description records in the collection registry using the survey results. We will design, develop, and test the forms that will enable NLG projects to enter and maintain collection metadata and, pending OMB approval, ask NLG projects to verify and augment their collection description records using these Web forms. Our revised estimate for a beta version of the collection registry (pending approval by OMB) is March 2004 for WebWise in Chicago. An initial production version of the registry should be available by June 2004.

Item-Level Metadata Repository (back to top)

Assisting projects in implementing OAI-data provider services

Although we were handicapped by the delay in distributing the survey, we did continue discussions with several NLG projects about implementing OAI data provider services, and collaborated during the last six months with two more projects to make their metadata harvestable.

Static OAI data provider service
In July 2003 we set up an OAI static repository for the NLG project “American Natural Science in the First Half of the Nineteenth Century” based at the Academy of Natural Science. A recent development in the OAI protocol and designed for use with small, relatively static metadata collections, a static OAI repository is a single XML file which contains metadata records and which sits on the data provider’s standard web server. A third party acts as a gateway through which an OAI service provider can then harvest that static XML file the metadata. This obviates the need for the source data provider to implement a new dynamic web service. A full technical description of the static gateway can be found at http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm. The project team worked with Eileen Mathias to map metadata from MARC records to simple DC and produced a single XML file (with both MARC and DC records available for harvest) which is now available through our third party gateway. This success of this implementation indicates that the static provider service is a good solution for institutions lacking technical infrastructure to implement new, dynamic web services.

CONTENTdm OAI-data provider service
In July 2003 we worked with the Washington State Libraries to harvest metadata from their CONTENTdm data provider service. CONTENTdm is a digital library management system which has built in an OAI data provider service. However, the current version of CONTENTdm (3.5) does not support resumption tokens. These are an optional feature in the 2.0 OAI protocol but aid in 'flow control' by allowing a data provider to issue records in manageable chunks to a service provider, thus limiting the peak load on both systems. Although optional, the implementation of resumption tokens is particularly important for large data providers. We examined other possible avenues for harvesting these records. We determined that dividing metadata into smaller sets (maximum of 10,000 records per set) could facilitate harvesting without flow control. We also developed a successful workaround in which we harvested records individually. While this work-around was slow, it put little to no stress on the web server and all metadata records were harvested successfully. We have contacted CONTENTdm about the lack of full functionality in their turn-key OAI data provider service. (In addition to not implementing resumption tokens, CONTENTdm can only provide metadata in simple Dublin Core).

Other OAI provider implementation discussions
In addition to the Academy of Natural Science and the Washington State Libraries, we consulted with several other NLG grantees including the Missouri Botanical Gardens, University of Connecticut, Indiana University, University of Washington, and Illinois State Library regarding plans for setting up OAI data provider services.

We also are tracking why NLG projects might not be able or ready to implement data provider services. Survey results will help with this task. A preliminary review based on conversations held so far indicate that NLG projects may not be in a position to implement OAI data provider services because:

  • There is no item level metadata. This is true for many exhibit and learning object focused projects.
  • The collection is not yet public. NLG projects wish to wait until they unveil their digital collection before sharing the metadata.
  • Infrastructure is not in place. The metadata may not be mapped into Dublin Core or stored in such a way to set up OAI data provider services.
  • The technical infrastructure is in transition or will be in transition. NLG projects are reluctant to implement OAI provider services in the midst of a migration to a new content management system.
  • Agreement must be reached among all project collaborators to share metadata via OAI.

During the next six months we will use survey results to segment NLG projects into four groups. Our preliminary results (as of Oct 24, 2003) indicate the following breakdown:

  • Group 1 - Projects with OAI data provider sites for NLG content: 15.
  • Group 2 - Projects whose institutions have an OAI implementation (not yet being used for NLG content) and NLG projects that have explicitly expressed plans to add OAI functionality: 16.
  • Group 3 - Projects who meet certain technical criteria - e.g. have item-level metadata and a maintained web site: 6.
  • Group 4 - Projects with no item-level metadata or no interest in providing metadata via OAI: 6.
  • Unknown: 50

Metadata harvesting and design of item-level repository

We have continued to harvest metadata from OAI-compliant NLG projects into our alpha item-level repository. As of October 23, 2003, we have harvested approximately 43,462 DC records from 15 OAI-compliant NLG projects. The repository is available at http://imlsdcc.grainger.uiuc.edu/searchimls/ (password protected). We have made some slight adjustments to the interface of this repository and have shared it with the NLG projects we are harvesting.

Over the next six months, we will continue to harvest OAI-compliant NLG projects. We plan further enhancements to the interface of the item-level repository, including the ability to search in specific subject areas (using the GEM subject headings). We also plan on using Spotfire, a data analysis tool, to aid us in analyzing metadata harvested and identifying areas where we may need to normalize values.

Research (back to top)

Data collection

The research plan for year one consisted of four iterative stages of data collection and analysis: 1) content analysis of the NLG proposals, 2) Survey 1, 3) e-mail follow-up survey, and 4) phone interviews with a representative group of projects. Stage 1 was completed as expected, but due to the delay in survey distribution we decided to make accommodations in the original plan. This involved moving forward with stage 4 without the benefit of having the baseline of data from the survey. Initially, interviews were to follow preliminary analysis of the survey and follow-up results, to build on and enrich the necessarily brief responses provided by survey methods. To keep the project moving forward, we altered our research design and in recent months began gathering interview data from some NLG projects. However, because surveys were sent at the same time we began interviews, several questions were asked in both instruments and therefore the iterative, longitudinal questioning approach was not achieved.

As of October 16, 2003, 13 interviews, conducted with participants from 9 project sites, have been completed. Transcription of the interviews is well underway, and we have begun initial analysis, especially in the area of collection definition and application of metadata schemes. These results were presented at the Dublin Core 2003 conference. See below for more details.

As mentioned above, the survey results are beginning to come in and are being entered into a SQL database. As of the writing of this report the response rate was at slightly more than twenty percent. Non-respondents have been contacted by the project coordinator which should greatly increase the response rate. Once the survey data has been reviewed, we will be sending email follow-up questions to clarify and expand on the survey questions.

Our next major data collection activity will be conducting focus groups in March 2004 at WebWise in Chicago. We plan to conduct two focus groups of approximately 6-10 participants each. The participants will be a convenience sample of IMLS NLG grantees in attendance at WebWise. We have begun the OMB approval process, and a final package of focus group questions and details on the research method will be sent to IMLS by December 1, 2003.

Dissemination of research results

As mentioned above, the research team gave a poster presentation at the Dublin Core conference in Seattle, WA, September 29 – October 2, entitled “Tracking Metadata Use for Digital Collection.” (ppt file) We reported on Stage 1 results from the content analysis of the project proposals and preliminary results from the interviews. It was evident from comments made in Neil McLean's plenary session and from the dearth of user-based projects, that there is a great need for research of this type in the metadata community.

We expect to present further results at conferences in the coming year in the form of contributed papers. Possible venues include ASIS&T Annual Meeting and the 2004 JCDL Conference. The team is also in the beginning stages of writing a paper on the topic of collection definition for digital distributed repositories.

Related Activities

We have been conducting a bi-weekly metadata roundtable where members of the Graduate School and Library and Information Science and the University Library community meet to discuss issues that surround the use and creation of metadata. Some of the topics we have discussed include collection level metadata, the Dublin Core recommended values for the collection type property, the Dublin Core Collection Level Application Profile, CIDOC Conceptual Reference Model (CRM), and the definition of a collection. Regular participants include faculty and both masters and doctoral students from GSLIS as well as university librarians. Guest participants have included Jane Greenberg from the School of Information and Library Science, University of North Carolina at Chapel Hill, who is a member of our steering committee, and other GSLIS visiting scholars. Both Carole Palmer and Ellen Knutson attended the Collection Description Working Group at the Dublin Core conference in Seattle. The concepts discussed and ideas generated at both the Collection Description Working Group Meeting and at the metadata roundtables have informed the research and implementation processes of the project and the research team's plans for publication on the topic of collection definition.

© 2003 IMLS DCC. Last updated on November 15, 2007 . Hosted by Grainger Engineering Library.

This project is a collaboration among the University of Illinois Library, the Graduate School of Library and Information Science (Center for Informatics Research in Science and Scholarship), and the Institute of Museum and Library Services, a Federal agency that fosters innovation, leadership, and a lifetime of learning.
IMLS Logo OA Logo UI Logo CIRSS Logo