About the New Zealand Web Archive
The New Zealand Web Archive is a collection of archived New Zealand and Pacific websites preserved within the National Digital Heritage Archive. Use the web archive to see a visual history of how websites have changed over time. The web archive is part of the Alexander Turnbull Library collections.
For more information about the Web Archive, please visit the National Library of New Zealand website.
Web Archive Discovery Platform Pilot
The Web Archive Discovery Platform pilot is a project that provides full-text search over a limited selection of National Library of New Zealand Web Archive collections, for the purposes of evaluating the viability of providing similar access to the full Web Archive.
Specific content included in this pilot is a harvested copy of the New Zealand Electronic Text Collection (NZETC), previously held by Victoria University of Wellington. Also included, is a small thematic collection of New Zealand general election related websites, harvested as part of the selective web archiving programme within the Alexander Turnbull Library.
This pilot platform provides full-text search across indexed web archive material. Viewing and access of web archive content itself is provided through the National Digital Heritage Archive (NDHA). Search results in the platform will link to the content held in the NDHA. The same web archive material can also be found through searching records on the National Library's online catalogue.
Technology
Webarchive-discovery is a project used to data-mine and index ARC and WARC files and make the contents explorable and discoverable. To achieve this, the warc-indexer component within Webarchive-discovery is used to parse the (W)ARC files and, for each resource, it posts a record into one or more search engine instances. Client facing tools can then be used that allow researchers to query the search engine index and explore the collections.
It is an open source project actively developed and supported by the British Library and Royal Danish Library, and used by other web archiving institutions.