Traditional libraries collect physical materials and organize them to make them available for users. In the same way, digital libraries collect and organizes digital materials. It makes them available on the internet, although some early ones comprised collections of CD-ROMs and current ones can use portable e-book readers.
Also, even though many libraries make their unique materials (manuscript collections and the like) available online, they are not digital libraries. Digital libraries transcend the collections of any one physical library and exist only online.
Digital libraries have only existed for about 30 years, but they arose from so many different sources that it’s impossible to trace their history in a straight line. They are bound up in the history of the internet, too.
Some internet history
Nowadays, everyone is familiar with the World Wide Web. It’s the most common way of using the internet. But the internet isn’t the same thing as the World Wide Web. In the beginning, the internet existed for exchange of information within the military and educational researchers. It started in the late 1960s. At that time, all computers were mainframes. Computer scientists were just beginning to learn how to link them together in networks.
By 1986, the internet connected computers all over the world. Unfortunately, there was no good way to find what information existed on any of them. Once anyone located it, retrieving information from another computer required using the primitive and awkward File Transfer Protocol. The Internet Engineering Task Force formed in that year to decide how the internet should operate.
At its 1992 meeting, various speakers presented possible breakthroughs. Tim Berners-Lee proposed the World Wide Web, which would connect pages on one computer to pages on another using hyperlinks. He hadn’t actually designed it yet.
The next day Mark McCahill, manager of the Microcomputer Center at the University of Minnesota, and programmer Farhad Anklesaria explained the Gopher project. It, too, used hyperlinks, but it was already working. At first Gopher seemed the more likely to succeed.
The brief reign of Gopher
When McCahill started working at the University of Minnesota, mainframe computers ruled. Microcomputers (such as IBM’s Personal Computer and Apple’s Macintosh) seemed like toys. When the university wanted to network its computers, a committee drew up a list of expectations, including requiring the mainframe, but no code for accomplishing it.
Anklesaria put together a new protocol on a Mac using programs to use one as a server and others as clients. A simple menu allowed users to point and click Because modems were so slow at the time, it was text only. It had a full-text search engine. They called their idea Gopher, the university’s mascot. A gopher is a mammal that digs. And a go-fer goes to fetch something. It took the team about three weeks to write the software.
The university’s committee reacted to their presentation with outrage. Although the idea fulfilled all the committee’s requirements, the server-client setup enabled anyone with a PC to use it. It bypassed the university’s central authority. The committee forbid the Gopher team from further work on the project.
When McCahill vowed to quit rather than abandon it, the team was allowed to continue to work on their own time. Since the university was unwilling to adopt it, the team simply made it available via FTP to whomever wanted to use it. It became a viral success. In fact, when Berners-Lee finally introduced a working World Wide Web, he used Gopher to get the word out.
By that time, however, modem speeds had increased. The World Wide Web offered images, color, and the web browser Mosaic. What’s more, it was set up for commercial use and welcomed .com sites. Gopher, still embroiled in university politics in Minnesota, couldn’t keep up. Web traffic eclipsed Gopher traffic in 1994.
The idea of a digital library
It’s impossible to understand digital library history without the World Wide Web. But it takes more than gathering information online to make a digital library.
The internet provides access to digital information anywhere in the world. A digital library must have not only curated content, but organize it, mount it on suitable technology and enable people find it.
August 1991 makes a convenient place to start considering digital library history. A system called “e-print archive” went online. It was quickly renamed arXiv. It’s a free distribution service for open-access scholarly writings in certain scientific disciplines. Unlike traditional journals, articles are not peer reviewed. Much of the information, however, comes from publicly funded research. By the end of the 1990s, similar projects had started for other disciplines.
These archive services became the prototype for “institutional repositories.” That’s where universities and other research institutions collect and disseminate research conducted by their faculty or staff. They don’t quite constitute digital libraries.
The Digital Library Initiative (DLI) started in the US in 1994. Similar projects exist in other countries, including developing nations. It funds projects that offer more of the functionality of a traditional library, including collection development. These collections comprise not only scholarly papers but also digitized maps, photographs, satellite images, videos, and more. DLI also pulls together research in digital library projects that had previously been fragmented among discipline-specific communities.
Many of the earliest projects that came out of DLI and equivalent initiatives were built from scratch. That is, some team would receive a research grant and design a digital library either to serve a specific community or design ways of organizing and presenting particular kinds of information. No other institution or community could simply reuse and install any of them.
Open access architecture and metadata standards
In 1995, the Networked Computer Science Technical Research Library attempted to bring some order to digital library development. It introduced the concept of open architecture. That is, its design is public, not a proprietary system owned by a particular company. Its structure makes it easy to add, upgrade, or swap out various components.
Open architecture meant that any institution could create a digital library using sets of standard protocols.
These include various metadata structures, that is, data about data. Libraries have provided some kind of catalog since antiquity. Catalog entries include such metadata as the title of an item and where to find it in the library. Modern library metadata also includes authors, date and place of publication, publisher, subject headings, and more.
Henriette Avram developed the first metadata scheme using computer technology for the Library of Congress in the 1960s. Still used today, it’s called MAchine Readable Cataloging (MARC). Digital libraries usually rely on a newer standard called Dublin Core.
A conference held in Santa Fe, New Mexico in 1999 began to establish ways for the various archives to interoperate. It recognized that participating institutions had two key roles. Data providers provide digital resources and their metadata. Service providers harvest the metadata for such services as searching for information or peer review systems.
Each of these various advances has solved some problems and exposed some more. So the evolution of digital libraries continues.
Public digital libraries
Academic and research institutions led all the digital library initiatives so far described. The history of digital libraries encompassed those for the general public to use, too. They operate in part by digitizing printed books and other materials to make them available electronically. Here are four of them:
Digital Public Library of America
The Digital Public Library of America makes materials from libraries, museums, and archives all over the country available in one place. It has millions of various kinds of documents.
Some of them are organized into online exhibitions, such as a collection of maps called “From Colonialism to Tourism.”
Others are organized into primary source sets, such as the poetry of Maya Angelou. The Digital Public Library also collects government documents and materials for genealogical research.
Project Gutenberg actually predates the personal computer, the internet, and electronic scanners. It started when a student at the University of Illinois, Michael Hart, conceived the goal of making thousands of heavily consulted books publicly available via computer.
As early as 1971, when he digitized the Declaration of Independence, he believed that the general public would eventually have easy access to computers. Without useful scanners, he and a team of volunteers had to type everything to make it available. Now, of course, the team uses scanners to add new materials to its collection.
Project Gutenberg digitizes only books in the public domain. That is, most of them were published in 1923 or before.
Google began its project to digitize nearly everything in 2004. Working with several large and mostly academic libraries, it scanned everything in the collections, public domain or not. It’s possible to download public domain books. Google makes books still under copyright available for purchase.
The project exposes a serious conflict between open access to information and copyright laws. If a book was still under copyright, Google made only brief snippets it available. Nonetheless, publishers objected and sued. Courts eventually decided that Google’s procedures constituted fair use under copyright law.
Internet Archive/Open Library
Although a smaller operation than Google Books, the Internet Archive is more ambitious. It wants to encompass all the works of all humanity—every book, music, video, web page, and software ever created. And it wants to make it all available to anyone who wants to access any of it. It also has software to make its electronic files accessible to the blind and dyslexic.
The Internet Archive doesn’t consider itself a competitor to Google any other effort. After all, for any organization to digitize what some other has already done simply wastes effort. Yet digital copies need to be stored in multiple locations to avoid losing the collection. A digital library on such a large scale demands cooperation.
It will also be necessary to gain cooperation of publishers, who must make a profit. Copyright issues still exist.
“Information wants to be free” is an idiotic slogan. Publishers act as gatekeepers to select the best and most reliable material for publication. The effort requires various levels of editors, physical printing, digital printing, distribution channels, marketing, sales, etc. And no one involved will work for free.
Meanwhile, don’t believe anyone who says that eventually everything will be available digitally. The most heavily used materials will be digitized. That will leave untold numbers of important materials that only a few people care about and might be consulted only a few times in a decade. Who will have the time or money to digitize them?
Also, totalitarian countries want to control the information their citizens can know about. They will not tolerate the open access that defines a true digital library.
But if the dream of digitizing everything and making it available to everyone can’t quite come about, digital libraries will make it happen as much as possible.
History, evolution and impact of digital libraries / Leonardo Candela, Donatella Castelli, and Pasquale Pagano; Research Gate. January 2011. Via Rutgers University
Introduction to digital libraries / Laura Pasquini, techKNOWtools. May 29, 2019
The rise and fall of the Gopher protocol / Tim Gihring, MinnPost. August 11, 2016
Transforming our libraries from analog to digital: a 2020 vision / Brewster Kahle, Educause Review. March 13, 2017