Preservation of online information presents a very difficult problem in general.
Federal government websites are especially vulnerable at the end of a presidential term. The End of Term Web Archive has preserved a snapshot of them since 2008.
I have seen web posts that speak of a frantic effort to preserve government information. They attribute it to fear of the incoming Trump administration.
Don’t believe such mindless hysteria.
Regardless of who won the election, the End of Term Web Archive team would be hard at work. Even when a President is reelected, turnover in the cabinet and at other agencies can be high. The new team often takes down the old sites to make room for its own.
If a government document exists in print, some archive preserves it. The National Archives or a presidential library preserve a lot. Government information that exists only on the web easily disappears without a trace. Just like any other web-based information.
Who has responsibility for archiving the government’s web presence? No one. Some sites have a mandated custodian. Many do not. Of all the PDFs on .gov websites in 2008, 83% disappeared by 2012..
The first End of Term Web Archive
The idea of an EOT archive came out of the 2008 meeting of The International Internet Preservation Consortium (IIPC).
The National Archives had crawled .gov sites in 2004 and decided not to repeat the effort. Several American institutions belong to the IIPC, including:
- Library of Congress,
- Internet Archive,
- California Digital Library,
- University of North Texas
- Government Publishing Office
And so these and other American attendees discussed the problem. They realized that they already collected government material for their own organizations. Pooling their efforts in a large-scale collaboration appeared as the obvious solution.
The George W. Bush administration was coming to an end. A new administration would take over regardless of how the election turned out. How would government websites change in the transition?
The new archive’s first task was to document the answer to that question. A press release that announced the initiative estimated the lifespan of a government website as 44 days.
The partner organizations asked librarians and other information specialists for help. Volunteers would elect and prioritize which sites to include. They began to crawl government websites in August 2008 and collected URLs in December.
They collected the same URLs after the inauguration in January 2009, and again in the spring and fall of that year. In all they gathered some 15.9 terabytes of data that documents how the sites changed.
Second and third End of Term Web Archives
As the Obama Administration comes to an end, the Trump Administration prepares to take office. the Internet Archive and its partners are harvesting webpages from more than 6,000 .gov and .mil domains and more than 200,000 hosts.
These pages comprise material from all three branches of the federal government and regulatory agencies. The partnership is also collecting social media feeds from about 10,000 official federal accounts.
The EOT Web Archive process
The Internet Archive hosts a searchable and browsable public access copy. The Library of Congress keeps a preservation copy. The University of North Texas also holds a copy for data analysis.
A project that occurs once in four years can’t possibly preserve all electronic data from the federal government. But it permanently documents the inevitable changes after every presidential election.
Government website harvest enlists librarians, educations, students / Lisa Peel, Library Journal. December 13, 2016.
Preserving U.S. government websites and data as the Obama term ends / Jefferson. Internet Archive Blogs. December 15, 2016
Obama inauguration. Public domain from Wikimedia Commons
Library of Congress Jefferson Building. Public domain from Wikimedia Commons
Internet Archive servers. Some rights reserved by John Blyberg