Wednesday, October 1, 2008

Week 6 Reading Notes

Researching Challenges in Digital Archiving and Long-term Preservation

Hedstrom's paper was disheartening. She maintains that there is tremendous work yet to be done in ensuring preservation through digital libraries. There is no date on her paper; perhaps the situation has improved in the years since she wrote it.

She outlines five concerns:
  1. We cannot manage and preserve digital libraries fast enough; we need either more manpower, more money, or more automated tasks.
  2. Uninterrupted preservation of digital objects into the unforeseeable future is a great challenge, since software and hardware require frequent migration.
  3. Lack of precedents and models means more knowledge is needed on legal issues, cost-benefit analysis, etc.
  4. Better technology is needed, especially to automatically write, extract, restructure and manage metadata.
  5. Networks and templates are needed to encourage standardization and interoperability among digital libraries.
Actualized Preservation Threats

Littman's article was not much more encouraging than Hedstrom's. However, it is useful that Littman is publicizing potential pitfalls in executing a digital repository. The major problems were media failure (such as portable hard drive issues), hardware failure (data loss and service disruption due to hard drive failure), software failure (including METS and XML issues), and operator errors (mistakes caused by humans).

Littman mentioned that metadata was encoded as METS, MODS, MARCXML, PREMIS, and MIX. Why were so many codes used? Does this increase interoperability with other libraries?

"Ingest of digitized newspapers into the repository began while the repository was still under development," wrote Littman. "This is probably fairly common." Why is this common? Why were developers so pressed for time that they had to ingest before testing the repository? Is this a result of poor project planning and lack of deadline adherence?

The Open Archival Information System Reference Model

A forum of national space agencies formed this model from a desire to establish shared concepts and definitions for digital preservation and archiving. The open forum developed standards for a repository that would preserve and provide access to information.

They delineated several responsibilities of an Open Archival Information System:
  1. Define collection scope and motivate information owners to pass items to the archive.
  2. Obtain sufficient custody and intellectual property rights of the items.
  3. Define scope of primary user community.
  4. Be sure users can independently understand the items.
  5. Create preservation policies and procedures.
  6. Make the archive available to the intended community.
The archive has three parts - environment, functional components, and information objects.

Environment: management + producer + consumer
  • Management: creates and enforces policy, defines collection scope, conducts strategic planning
  • Producer: ingest information and metadata, guided by submission agreement
  • Consumer: information users, including designated community (smaller group of primary users who independently understand archived items)
Functional components: ingest, archival storage, data management, preservation planning, access, administration
  • Ingest: receive information, validate completeness of information, extract/create metadata
  • Archival storage: ensure information is stored correctly, refresh media, migrate between formats, check for errors, create disaster recovery plans
  • Data management: maintains metadata and databases
  • Preservation planning: creates preservation plan, keeps abreast of new storage and access technologies
  • Access: the user interface, helps users find and access information
  • Administration: coordinates operations of five previous services, monitors performance
Information objects:
  • Submission information package (SIP): original ingested item
  • Dissemination information package (DIP): what user accesses
  • Archival information package (AIP): descriptive information and packaging information (content information + preservation description information)
  1. Content information: content data object (the information) + representation information (renders bit sequences)
  2. Preservation description information: reference (unique identifier such as ISBN) + provenance (history of item's creation, owners, etc.) + context (relationship to other documents) + fixity (authenticity validation such as digital signature or watermark)

No comments: