Friday, October 17, 2008

Week 8 Reading Notes

The Truth About Federated Searching

I do not completely trust the veracity of this article because it comes from a private corporation that sells federated search technology.

  • Federated searching is a Web-based search of multiple databases.
  • User authentication can be problematic. Federated searches should be available to patrons on-site as well as off-site.
  • True de-duping, which eliminates all duplicate search results, is a myth. For all identical records to be dismissed, the search engine would spend hours processing. However, it seems wise to de-dupe if only to eliminate duplicates in initial results sets.
  • Relevancy ranking is based on words only in the citation, not in the abstract, index, or full text. Therefore, relevancy ranking ignores many of the crucial keywords. This suggests it is important to skim several pages of results instead of assuming the first 5 or 10 will be most useful.
  • "You can't get better results with a federated search engine than you can with the native database search." So why use a federated search? It still saves time to search several databases at once than to seek out the portals of individual databases. That is, unless you know of one or two individual databases that consistently provide you with results relevant to your topic.
  • One of Hane's myths, that federated searching is software, is still positive. Because federated searching is a service, not software, libraries don't need to update databases using translators on a daily basis.
Federated Searching: Put It in Its Place

Miller's article supports using federated searching in conjunction with library catalogs. In many ways, his 2004 article is woefully outdated. The University of Pittsburgh's digital libraries are already searchable through a Google-like search box; this engine is called Zoom! and is much slower and less reliable than Google. On the other hand, the article is still relevant and still being ignored. The University of Pittsburgh's actual catalog (as well as the Carnegie Library's catalog, and many others) does not have a Google-like interface. Users type a search term but also choose limitations such as title, author or location. Personally, this is useful and easy, but for a generation raised on Google, library catalogs probably need to evolve.

I agree with Miller that "Amazon ... has become a de facto catalog of the masses." When I was a reference assistant, finding a specific title for a patron was much easier in Amazon than on the library's own catalog. Amazon's visual aspects, ease of searching, patron reviews and reader's advisory were usually preferable. For research, I often consult WorldCat before searching a local catalog.

Search Engine Technology and Digital Libraries

Lossau confronts the problem of including non-indexed Web content in library searches. First, much of the Web's content is not appropriate in the academic world because it lacks authenticity and authoritativeness. Second, much of this information changes by the second. Also, the content is not guaranteed to exist, nor to exist at the same location.

Much of the deep web, however, contains highly authoritative information, especially regarding the sciences. To access this, Lossau suggests a search index with a customizable interface that can mesh with existing local portals. Users should be able to display their result sets in a variety of ways. Automatically extracted metadata can improve access to useful but difficult-to-find materials that have not been indexed by a human.

No comments: