Tuesday, November 25, 2008

Week 13 Reading Notes

Lynch: Where Do We Go from Here?

Lynch reminds readers that digital libraries didn't suddenly appear in the 90s alongside the Web, but instead trace their advent to the 60s. Digital library research began receiving major governmental funding in the mid-90s, which "legitimized digital libraries as a field of research." Lynch is very appreciative of the NSF funding that encouraged collaboration and community building, especially among a diverse group of sectors.

The bulk of programmatic government funding for digital libraries has tapered, except for research involving defense, intelligence and homeland security. Money is being directed at digital asset management, institutional repositories, and creating new collections.

In the future, the largest issues for digital libraries will be preservation and ethical stewardship. Lynch would like to see more studies on privacy/personal information management, user-behaviors, interactive library environments, and libraries' role in learning and human development.

Lynch writes, "The lion's share of the NSF funding went to computer science groups, with libraries often being only peripherally involved, if at all." I wonder if digital libraries would be more user-friendly today if librarians had gotten a bigger slice of the government pie back then.

Stiglitz: Intellectual Property Rights and Wrongs

This Nobel laureate wants to empower the developing world through less stringent intellectual property rights and more open access/open source content. He criticizes biased intellectual property rights that benefit developed nations and their monopolistic corporations. These rights, he says, can hinder innovation, which relies on liberal dissemination of ideas.

Stiglitz writes, "Monopolists may have much less incentive to innovate than they would if they had to compete." He uses Microsoft's squelching of the Netscape browser as an example. I agree that Microsoft needs more competition, especially since Vista is far from innovative. However, despite Microsoft's dominance, innovation has not died: Firefox is free, open source, and flourishing.

Knowledge Lost in Information

This report asks the NSF to continue funding digital libraries at a price of $60 million a year. Lapsed funding could cause international competitors to overtake the U.S. in research accomplishments. It could also lead to more chaos in libraries as users lose control over the world's growing masses of information.

Some of the research needs include:
  • transferring research models to broader contexts, thus increasing user populations
  • better strategies for accessing information in various formats
  • cognitive completion, or prompts that help a user pinpoint her desired subject
  • proactive storage systems that automatically cull new information
  • user-centered design, customizable user interfaces
  • automatic production of metadata
  • creating a universal architecture
  • active replication to ensure preservation
  • interoperability

Some reasons for NSF to fund digital libraries:

  • national security will be stronger
  • students will be better equipped to compete in a global economy
  • will motivate research in health and environmental sciences

I enjoyed this quote from Herbert Simon: "A wealth of information creates a poverty of attention." The authors of this report maintain that digital library research should provide ways to manage the information glut. One way is to "reduce available information to actionable information." This is a tricky task: who decides what is actionable and what should be tossed? For instance, 20 years from now, hurricane warning information for Katrina will not be actionable, but it will still have historical relevance.

Week 13 Muddiest Point

I have no muddiest point or question.

Friday, November 21, 2008

Week 12 Reading Notes

Arms Chapter 6: Economic and Legal Issues

Arms purports that challenges in digital libraries cannot be solved through new laws or technology. Instead, social information customs should be created first.

There are two economic models for digital libraries: open access and payment systems. Open access libraries are usually funded by grants, advertising, the government, or private individuals. Payment systems are funded by annual or monthly subscriptions, hourly rates, rates based on the number of workstations providing access, rates based on the number of simultaneous users, or rates based on the number of digital objects accessed. Arms says subscription is the most popular payment form because it is predictable. Libraries prefer subscriptions because they allow for the widest, simplest access.

Arms discusses the frustrating cycle of scholarly publishing in which academic libraries must pay exorbitant prices for their own faculty's articles by going through middlemen like Elsevier. He blames universities for using the quantity of published articles as a basis for faculty promotion and awards. "Because prestige comes from writing many papers," writes Arms, "they have an incentive to write several papers that report slightly different aspects of a single piece of research."

Though most academic libraries use electronic databases for journal access, there are three main flaws in this system:
  • If the publisher goes out of business or changes its database coverage, the library is left with neither current articles nor archives.
  • In order to maintain profits, publishers sell database subscriptions in bundles, which forces libraries to subscribe to less-favored databases in order to secure a reasonable price on the essential ones.
  • Patrons' rights involving the use of articles is murky. Some restrictions would prevent users from printing or saving copies of articles for individual use.
Arms Chapter 7: Access Management and Security

Access management is the policies that authorize users to access digital libraries, whether open or fee-based. The two parties in access management are the information managers, who create and implement policies, and the users, who must authenticate their roles.

Users may authenticate their roles by what they know (such as a username and password), what they own (such as a smart-card), where they are (i.e., at a particular IP address) and who they are (via biometrics, for instance). Common user roles are determined by group association (such as a Pitt student), location (someone at a Carnegie Library of Pittsburgh computer), subscription (someone paying a monthly fee to an online journal), robotic use (a spider crawling the Web), or payment (a user is paying per view). Users may be restricted in their computing actions and to what extent they may use a digital object.

Enforcing access management policies is tricky: too many controls can turn off users but too few controls may invite abuse. Some digital library operators choose fewer controls, knowing that happy users will be repeat customers and therefore compensate for profits lost to illegal users. Arms advocates displaying an access statement to users; in some cases this bears the strength of the law.

Digital library security and authenticity can be ensured via firewalls, encryption, watermarking and digital signatures.

Lesk on Economics

Lesk links library economies to publishing industry economies and says that digital libraries will need new funding strategies in an era of skyrocketing information prices. Libraries have difficulty with funding, he says, because users do not appreciate the value of library holdings and operations.

Lesk mentioned the idea of a library as a "buying club" where users are motivated to participate because sharing resources is cheaper than buying personal copies for permanent use. This is an interesting notion, but it paints the library as only an on-demand information source, not an institution upholding preservation and literacy. Publishers do not always win through the buying club model, but it does seem best for the average user.

Another economic model that Lesk considers is transclusion, in which users must pay to see quotations in an article. In exchange for a fee, the user will be directed from the original work to the cited work. Such a model, at least for the average undergraduate or graduate researcher, is prohibitive. I doubt a freshman would pay to see such information; the fees would likely drive her to cheaper, less reputable sources.

Lesk discusses copyright law's adverse effects on users. Since the Berne Convention, works do not need to bear copyright notices, dates or authors. Thus, finding the work's author or owner may be impossible.

This article is outdated, so I wonder if what Lesk says about authors avoiding online publishing is still true. Are faculty today loathe to publish in open access journals because their deans may not consider it tenure-worthy work?

One item in this chapter surprised me. I have seen lots of suggestions on how libraries can save money, but I had never seen advertising suggested as a way to get funding. How might this work? Instead of making a patron pay to see an online article, might he view a 5-second ad first? Or would library shelves have sponsor posters tacked to each end, just as corporate names line a football stadium? This suggestion is far more disturbing to me than resorting to pay-per-view research. Libraries are the last public places without advertising, and we should try to keep them that way.

Wednesday, November 19, 2008

Week 12 Muddiest Point

I have no muddiest point. I have a few questions:
  • Have the XML assignments been graded and returned? I have not received mine.
  • Regarding Greenstone: How do we divide our library into three collections? As of yet all of our content is in one big collection. How can we edit the text on the front page and can we edit the graphical user interface? How can we add captions to image thumbnails? Why do our title searches (and all other searches) give us 0 results? How does one delimit Dublin Core elements? We tried commas and semi-colons unsuccessfully.

Wednesday, November 12, 2008

Week 11 Reading Notes

A Viewpoint Analysis of the Digital Library

Arms lists three viewpoints with which to investigate this question: "Should digital libraries be self-sufficient islands or should we strive for a single global digital library?" The viewpoints are: organizational (such as the Library of Congress), technical, and user's.

The first viewpoint emphasizes the individual library as the source of all knowledge. It does not focus on collaboration or interoperability with other libraries and their interfaces. This viewpoint is often ineffective for users.

The second viewpoint concerns the technical systems of digital libraries. Interoperability between structures and metadata is key, but users are not. This viewpoint has offered many successes, such as XML, Z39.50 and MARC. Unfortunately, these advances are not used to their potential.

The third viewpoint, from the user, is indifferent to technological and organizational viewpoints. Interfaces are not adequately uniform from institution to institution. Organizations such as the Library of Congress may not even be in the user's subconscious.

Arms advocates more study of the user's viewpoint. He suggests holistic evaluations in which the user accesses multiple libraries.

Social Aspects of Digital Libraries

This paper, the result of a workshop between UCLA and the National Science Foundation, asserts that digital library creators should be more concerned with their social context. The article states an obvious but often ignored goal: "digital libraries should be constructed in a way that accommodates the actual tasks and activities that people engage in when they create, seek, and use information resources."

Users, if unimpressed by institutional digital libraries, "will construct digital libraries on their own behalf." Thus, digital library creators should follow one of the basic tenets of Website portal design: allow the interface and contents to be organized according to individual user preferences.

The Information Life Cycle has three stages: creation (when the digital object is active - being creating, modified, and indexed), searching (when the digital object is semi-active - being stored and distributed), and utilization (when the digital object is inactive - being discarded or mined).

Some of the issues that stand out to me are:
  • how to facilitate information sharing across multiple user communities
  • how to describe and organize content in flux (such as Web sites)
  • when to use human versus automated indexing, despite human indexing's cost and time
  • whether to create a single interface for a library, or different interfaces that are more useful for different groups (such as a simpler interface for children and a highly manipulable, complex interface for academicians)
The conclusion of this report is similar to Arms' conclusion, that user interaction with digital libraries needs to be studied further, especially among different cultural groups.

The Infinite Library

This article evaluates Google Book Search rather objectively.

Some concerns include:
  • entrusting global literary heritage to a corporation
  • libraries devoid of physical content; libraries as lonely shells for preservationists
  • libraries' inability to share their digital copies of scanned books with anyone but Google
  • handwritten texts that are unsearchable via OCR
  • digitizing books in the less stringent Internet Archive instead of Book Search
Some benefits include:
  • increased need for librarians to help guide patrons through the morass of online text
  • global access for books previously available only in noncirculating or restricted libraries
My personal concern is the sunny belief that Google is a "good citizen" as librarian John Wilkin notes. Google is relatively good now, but that guarantees nothing for the future. Like civilizations, companies rise and fall, go bankrupt, get bought out by conglomerates, and so on. Google's collaboration with the Chinese government to censor public Internet access does not qualify as good citizenship.

Week 11 Muddiest Point and Question

I have no muddiest point, but I have a question about Greenstone. How can I upload files into Greenstone? My understanding is that under the Gather tab, I can choose Local Filespace, choose a drive and file. When I try to drag and drop this file to the right, under Collection, nothing happens. Some students have expressed interest in dedicating some class time to a brief Greenstone tutorial about, for instance, uploading files. Is this possible?

Friday, October 24, 2008

Week 10 Reading Notes

Digital Library Design for Usability

This article outlines five models of computer systems design. The authors find several of these models lacking. The most successful design elements gleaned from these five models include:
  • Learnability: the user can start using a digital library quickly without picking up a lot of new skills.
  • Memorability: the user can remember how to use the library after a significant length of time.
  • The user should be able to recover from errors.
  • The user should be able to save search results or search paths for later use.
  • Users within an organization should be able to get training and guidance on using the library.
  • Library prototypes should be tested on end users and revised before the final product is released.
  • Proactive "agents" that know a user's preferences can alert her to new items of interest.

Evaluation of Digital Libraries: An Overview

Saracevic's typo-riddled article points out that evaluation of digital libraries, especially commercial libraries, is rare. When digital libraries are evaluated, a systems-centered approach is most common. Human- and user-centered approaches are less common. To me this is problematic; if digital libraries are used by humans, their needs should be evaluated first. Perhaps this is why Saracevic notes that "users have many difficulties with digital libraries" such as ignorance of the library's capabilities and scope.

I strongly disagree with his assertion that "it may be too early in the evolution of digital libraries for evaluation." Even when he wrote his article in 2004, many digital libraries were in existence. Now there are even more, and the number is growing all the time. Institutions spend hundreds of thousands of dollars on commercial digital libraries alone, so they should have some evaluation results on which to base their funding allocations.

Arms Chapter 8: User Interfaces and Usability

Arms details some reasons for the disconnect between end users and digital libraries. First, the interfaces, collections, and services in digital libraries change constantly, but the user adapts slowly. This can cause much frustration. Second, digital libraries were initially used primarily by experts who understood what they were using. Now that the Internet is nearing ubiquity, fewer digital library end users are experts. They "do not want to spend their own time learning techniques that may be of transitory value." Thus digital libraries must be accessible to both skilled and unskilled end users.

Arms lists four parts of a digital library's conceptual model: interface design, functional design, data and metadata, and computer systems and networks.

Several points stood out to me:

  • To increase space on the screen for content, remove on-screen manipulation buttons and have the user navigate with keystrokes.
  • Structural metadata is required to relate page sequence with actual page numbers. The page number in the original document rarely matches the sequence of the digital version, since prefaces and tables of contents are seldom numbered.
  • To reduce the time of page loading, data can be sent to the user's computer before she requests it. If she is viewing page 6, for instance, the computer can "pre-fetch" page 7 in the meantime.

Some of Arms' suggestions for digital libraries:

  • They should be accessible from anywhere on the Internet.
  • The interface should be extensible.
  • Content should be accessible indefinitely. (This tenet seems under threat by copyright laws and DRM.)
  • Interfaces should be customizable.
  • Spacial representations of library content can aid the user's memory and increase access.
  • Interfaces should have consistent appearance, controls, and function.
  • The interfaces should provide feedback to users about what is happening.
  • Users should be able to stop an action or return.
  • There should be several ways for the user to complete the same task; some routes can be simple for the novice user while some routes can be faster for experts.
  • Interfaces should be accessible regardless of a user's computer display preferences, Internet speed, or operating system.
  • Caching and mirroring should be used to reduce delays in information transfer over the Net. Through mirroring, the user accesses the content closest to her, though it may be stored on several servers around the globe.
  • Summarize the user's choices.