HathiTrust is currently experimenting with large-scale full text searching as part of an effort to create a mechanism to search across the entire repository. As an initial public beta of full text search functionality, we are offering a simple mechanism to search across all of the fully viewable works (both those in the public domain and those for which we have permissions) and a sprinkling of search-only works (i.e., in-copyright works where we may not show the text of the work).
The size of the content indexed is approximately 500,000 volumes, and the majority of the works are fully viewable. Although this is a fully functioning and reliable search mechanism for these works, we provide it as a public beta in order to learn more about these large search indexes in a public setting.
More information on our process to explore issues in this area is available in the HathiTrust.org large-scale search report.
Original post, 02008 10 14:
An exciting development in the world of digital repositories, this may be a first for North America, the HathiTrust Shared Digital Repository:
A group of the nation’s largest research libraries are collaborating to create a repository of their vast digital collections, including millions of books, organizers announced today. These holdings will be archived and preserved in a single repository called the HathiTrust. Materials in the public domain will be available for reading online.
Launched jointly by the 12-university consortium known as the Committee on Institutional Cooperation (CIC) and the 11 university libraries of the University of California system, the HathiTrust leverages the time-honored commitment to preservation and access to information that university libraries have valued for centuries. UC's participation will be coordinated by the California Digital Library (CDL), which brings its deep and innovative experience in digital curation and online scholarship to the HathiTrust.
"This effort combines the expertise and resources of some of the nation’s foremost research libraries and holds even greater promise as it seeks to grow beyond the initial partners," says John Wilkin, associate university librarian of the University of Michigan and the newly named executive director of HathiTrust. Hathi (pronounced HAH-tee), the Hindi word for elephant incorporated into the repository's name, underscores the immensity of this undertaking, Wilkin says. Elephants also evoke memory, wisdom, and strength.
As of today, HathiTrust contains more than 2 million volumes and approximately 3/4 of a billion pages, about 16 percent of which are in the public domain. Public domain materials will be available for reading online. Materials protected by copyright, although not available for reading online, are given the full range of digital archiving services, thereby offering member libraries a reliable means to preserve their collections. Organizers also expect to use those materials in the research and development of the Trust.
Volumes are added to the repository daily, and content will grow rapidly as the University of California, CIC member libraries, and other prospective partners contribute their digitized content. Also today, the founding partners announce that the University of Virginia is joining the initiative.
Each of the founding partners brings extensive and highly regarded expertise in the areas of information technology, digital libraries, and project management to this endeavor. Creation of the HathiTrust supports the digitization efforts of the CIC and the University of California, each of which has entered into collective agreements with Google to digitize portions of the collections of their libraries, more than 10 million volumes in total, as part of the Google Book Search project. Materials digitized through other means will also be made available through HathiTrust.
HathiTrust provides libraries a means to archive and provide access to their digital content, whether scanned volumes, special collections, or born-digital materials. Preserving materials for the long term has long been a mission and driving force of leading research libraries. Their collections, accumulated over centuries, represent a treasury of cultural heritage and investment in the broad public good of promoting scholarship and advancing knowledge. The representation of these resources in digital form provides expanded opportunities for innovative use in research, teaching, and learning, but must be done with careful attention to effective solutions for the curation and long-term preservation of digital assets.
"The CIC Libraries have always worked at a large scale, with big collections, big user communities and high expectations for service. They are not intimidated by big challenges, and will bring their comfort with this to the development of the shared digital repository," says Mark Sandler, director of the CIC Center for Library Initiatives.
"The University of California libraries have an unparalleled reputation for innovation in digital library development and inter-institutional collaboration," says Laine Farley, interim executive director of the California Digital Library."Participation in the HathiTrust continues this tradition and will enable UC to provide its students and scholars with access to one of the most significant digital collections ever assembled." Adds Brian Schottlaender, the Audrey Geisel University Librarian at UC San Diego, "The University of California Libraries are pleased to work collaboratively with our CIC colleagues to build a rich and coherent resource accessible to scholars for the long-term."
"Researchers will benefit from the expert curation and consistent access they have long associated with the CIC research libraries," says Michael McRobbie, president of Indiana University. "Great libraries have long been essential to outstanding scholarship, and the HathiTrust collaboration among the CIC institutions, the University of California and others provides an essential tool for 21st- century scholars."
"Digitization of print texts has the promise of being transformative of scholarship and of library practice," says Paul Courant, University of Michigan librarian, dean of libraries, and former provost. "In both areas, the ability to search many texts and to preserve texts accessibly creates tremendous opportunities for collaboration amongst scholars and universities. HathiTrust has made a good start, and like the elephant for which it is named, we expect that it will prove able to carry and deliver valuable resources with grace and reliability."
"Before this collaboration," Wilkin says, "the collections in each library existed in isolation. Now we are bringing them together, pooling resources and eliminating redundancies, and producing a valuable research tool that will be greater than the sum of its parts." ...
Source: DIGLIB mailing list, 02008 10 14