LOST AMONG THE DATA

 

By Mitchell W. Pearlman*

 

There’s a sign outside the office of the Executive Director of the New York Committee on Open Government that reads:

"Oceans of Data

Streams of Information

Puddles of Knowledge

Drops of Wisdom"

I don’t know who wrote this quartet of watery imagery, but it makes a compelling point. To paraphrase Daniel J. Boorstin, former Librarian of Congress and author of The Seekers, never before has humankind had so much information available to it but understood so little.

In theory, the availability of vast quantities of data is a good thing. Data can lead to information, which can lead to knowledge, which, in turn, can lead to wisdom. But it seems that in this age of powerful computers and communications systems, we aren’t necessarily better informed, or even more knowledgeable. And I doubt that most of us are any wiser as a result of this new technology.

Modern computers and communication systems have made data one of the most sought-after commodities in the world. I’m defining "data" here as stored information in general. On the other hand, I’m defining "information" as specific data sought to be retrieved with a particular purpose in mind. While a single unit of data generally has marginal, if any, value, massive collections of data - now called databases or databanks - can, and often do, have great value. Such compilations have great value because they can be searched or "mined" for information.

Earlier this decade our society had to decide whether computers, software and systems would be "open platforms" available to the many, or merely to the few who could afford them. Fortunately, to a significant extent, our business and political leaders opted for open platforms available to the many. Personal computers are now available to most at affordable prices or at public institutions such as libraries, schools and community centers. Software has become so "user friendly" that even the "technophobic" among us can handle, if not master, a PC or a lap top. And computer networks, notably the Internet, make data available in quantities unimaginable as recently as a few years ago.

Now we must consider whether huge, integrated databases and databanks actually aid or hinder the process of accessibility by creating the metaphorical equivalent of finding a needle of information in a data haystack. For accessibility by the population in general is the key not only to an informed society, but to the thoughtful exercise of knowledge, which may be the source of wisdom.

The process by which data are collected, stored and retrieved is called "archiving." The task for electronic archivists on behalf of all of society is to effectively organize and permanently store vast amounts of data so that specific information can be located and retrieved quickly and inexpensively on media compatible with the machines used at the time of retrieval. For example, we must be able to retrieve data stored on today’s floppy disks, tapes and hard drives at a time when such storage media are no longer used or even manufactured.

The magnitude of the data and the rapidity of changing technology have exacerbated the problem of archiving electronic records so that, as of now, technology hasn’t yet offered satisfactory solutions. The consequence of this failure will be to put at risk practical access to much data, and limit the ability of users to extract important information from such data.

The state of Connecticut is contemplating establishing a computer center that would serve many government agencies. It will have databanks containing enormous amounts of data available to the public under freedom of information laws. If, however, these databanks aren’t designed and managed effectively, they may, in effect, inhibit the ability of citizens to seek out specific information because it’s hidden among a mass of data. For if a person can’t identify or search out the information he or she wants, that information is as useless as if it didn’t exist. Thus, the utility of any large collection of data is directly related to the effectiveness of its archiving system.

Today’s means by which specific data or information can be located and retrieved are called "search engines." The problem with the most popular and least expensive search engines is that they can’t discriminate sufficiently among the enormous number of databases and databanks and among the enormous quantity of data stored in such databases and databanks to find the precise information sought. Rather, they often provide scores, hundreds or even thousands of on-line locations or files to look for the information sought.

For example, what if a person wanted a particular record referring to the governor, and all the search engine could provide is a list of records with the governor’s name in them? That list might contain literally thousands of entries. Unless the information seeker is a skilled researcher or has access to more sophisticated (and expensive) search engines, the seeker, like a mythical wanderer, might be left to tediously search for a seeming eternity through the world of available data.

Interestingly enough, the process of finding and gathering information from the global environment of databases and databanks is called "mining." And like mining for gold or valuable gemstones, it can be laborious, expensive and require good measures of expertise and luck. But given the importance of information in today’s and tomorrow’s world, is it socially, economically, politically or even morally acceptable to predicate information-mining on such factors as time, wealth, expertise or even luck?

In the public sector context, is our government really performing its essential role in a democratic society of providing needed public information if it merely places data on-line knowing full well that a person must have a high degree of technical expertise, a sophisticated search engine and even a well-honed understanding of government records and operations to be able to locate specific information in the electronic haystack of government databases and databanks?

The ability to mine information electronically to a large extent will determine not only the "haves" and "have-nots" in our society, but also how informed and knowledgeable our electorate ultimately is. We must therefore insist that publicly available databases and databanks be open, organized and managed to facilitate ease of access to sought-after information. Limiting public access by placing or hiding information among the oceans of data without adequate search and retrieval means must not be tolerated. Only then will the promise of the information age be realized with a more informed and knowledgeable citizenry who are wiser because of the technology, and not frustrated and kept down by it.

 

______________

*Executive Director, Connecticut Freedom of Information Commission.