National Archives sign at Kew Gardens Station

National Archives sign at Kew Gardens Station

2009-06-21

RODA open source digital repository software archives from Portugal

From the announcement on DIGITAL-PRESERVATION@JISCMAIL.AC.UK (02009 06 21):



RODA is an open source digital repository specially designed for archives, with long-term preservation and authenticity as its primary objectives. Created by the Portuguese Directorate-General for the Portuguese Archives in partnership with the University of Minho, it was designed to support the most recent archival standards and become a trustworthy digital repository.

RODA embodies high level standards of security, scalability and usability. Its centralized architecture enables an easy management while the self-deposit ingest tools and workflow procedures account for the scalability of the human-resources.

RODA's main features are:

- Based on standards (OAIS, METS, EAD, PREMIS, etc.)
- Long-term Preservation and Authenticity
- TRAC compliant (Trustworthy Repositories Audit & Certification)
- Secure (SSL, fine grained permissions)
- Scalable (SOA)
- Clean web user interface and ingest desktop tool
- For archivists, for producers, for consumers
- Open source

A DEMO installation of RODA is now open to the general public at the URL: http://roda.di.uminho.pt


RODA, which has a bilingual Portuguese and English language, appears to be built on Fedora. A RODA ingest tool, RODA-in, is available for Windows or any Java-enabled platform under the Pre-Ingest menu. It does not appear as though the software itself is available for downloading.

2009-06-11

Celebrate 200 Years of Darwin with Amazon.com

Amazon.com jumped on the Darwin twin anniversary bandwagon with this page of books by and about Darwin and his radical theory of biological evolution published in his 1859 bestselling book On the Origin of Species.

2009-06-03

Google Wave generating a tsunami of interest

I guess you saw that one coming. Check out the official Google Wave preview. Lots of hype that you can read about through this Google search. Since it's dependent on this protocol, don't forget to go over the Google Wave Federation Protocol to learn even more. This is what the Google Wave API says about Google Wave:

Google Wave is a product that helps users communicate and collaborate on the web. A "wave" is equal parts conversation and document, where users can almost instantly communicate and work together with richly formatted text, photos, videos, maps, and more. Google Wave is also a platform with a rich set of open APIs that allow developers to embed waves in other web services and to build extensions that work inside waves.

Scoopler (beta) real-time search results

Here's an impressive looking proof-of-concept undergoing beta testing: Scoopler, a search engine that gives you real-time search results. This is a screenshot:

WARC file format published as an international standard

The big news today in the Web preservation world is the publication of the WARC file format as an international standard. Here's most of the announcement as circulated to various mailing lists:
The International Internet Preservation Consortium is pleased to announce the publication of the WARC file format as an international

standard: ISO 28500:2009, Information and documentation -- WARC file format.

[http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=44717]

For many years, heritage organizations have tried to find the most appropriate ways to collect and keep track of World Wide Web material using web-scale tools such as web crawlers. At the same time, these organizations were concerned with the requirement to archive very large numbers of born-digital and digitized files. A need was for a container format that permits one file simply and safely to carry a very large number of constituent data objects (of unrestricted type, including many binary types) for the purpose of storage, management, and exchange.

Another requirement was that the container need only minimal knowledge of the nature of the objects.

The WARC format is expected to be a standard way to structure, manage and store billions of resources collected from the web and elsewhere. It is an extension of the ARC format [http://www.archive.org/web/researcher/ArcFileFormat.php ], which has been used since 1996 to store files harvested on the web. WARC format offers new possibilities, notably the recording of HTTP request headers, the recording of arbitrary metadata, the allocation of an identifier for every contained file, the management of duplicates and of migrated records, and the segmentation of the records. WARC files are intended to store every type of digital content, either retrieved by HTTP or another protocol.

The motivation to extend the ARC format arose from the discussion and experiences of the International Internet Preservation Consortium [ http://netpreserve.org/ ], whose core mission is to acquire, preserve and make accessible knowledge and information from the Internet for future generations. IIPC Standards Working Group put forward to ISO

TC46/SC4/WG12 a draft presenting the WARC file format. The draft was accepted as a new Work Item by ISO in May 2005.

Over a period of four years, the ISO working group, with the Bibliothèque nationale de France [http://www.bnf.fr/ ] as convener, collaborated closely with IIPC experts to improve the original draft.

The WG12 will continue to maintain [http://bibnum.bnf.fr/WARC/ ] the standard and prepare its future revision.

Standardization offers a guarantee of durability and evolution for the WARC format. It will help web archiving entering into the mainstream activities of heritage institutions and other branches, by fostering the development of new tools and ensuring the interoperability of collections. Several applications are already WARC compliant, such as the Heritrix [http://crawler.archive.org/ ] crawler for harvesting, the WARC tools [http://code.google.com/p/warc-tools/ ] for data management and exchange, the Wayback Machine [http://archive-access.sourceforge.net/projects/wayback], NutchWAX [http://archive-access.sourceforge.net/projects/nutch] and other search tools [http://code.google.com/p/search-tools/] for access. The international recognition of the WARC format and its applicability to every kind of digital object will provide strong incentives to use it within and beyond the web archiving community.

A press release is available on the IIPC website:

http://netpreserve.org/press/pr20090601.php

2009-06-01

What Is New in Digital Preservation, Issue no. 20 (January-April 02009) available

What’s New in Digital Preservation no. 20 covering the period January to April 02009 is available. It’s compiled by Najla Rettberg for the Digital Preservation Coalition (DPC) and reviewed by PADI, the National Library of Australia.