A Prognosis for Continued Disarray in

Electronic Scholarly Communication

Gregory B. Newby

Graduate School of Library and Information Science

University of Illinois at Urbana-Champaign

501 East Daniel Street; Champaign; IL; 61820; USA



Abstract

Scholarly publishing in electronic form is not mature. Common uses and features of print publications do not work well for electronic documents. Missing aspects include the use of keywords and database record formats suitable for information retrieval, inclusion of formatted meta-data such as the author's name and affiliation in documents, the role of commercial or academic publishers as value-added gatekeepers, and perceived value for promotion and tenure.

The growth of scholarly electronic communication has not waited for these features to develop. Indeed, there is gathering evidence that electronic forms of communication, from pre-print archives to electronic journals and discussion groups, may be more important for everyday scholarly life (if not for gaining tenure) than traditional media. Traditional media--journals, books, conferences proceedings, etc.--are not threatened by the emerging focus on electronic communication, and indeed have flourished recently. This is not a conflict, but rather a result of the activities scholars must engage in to be viewed as productive and tenurable being out of sync with the activities they must engage it to be well-informed and well-connected.

There are many efforts underway to further legitimize, codify, organize, and otherwise manage scholarly electronic communication. This work will examine the many challenges that must be overcome and provide an estimate of the timeline for their resolution. It is anticipated that the role of relatively unstructured, uncontrolled, and informal electronic scholarly communication will be of continued importance, yet will largely remain independent of efforts to create standards and protocols for electronic books, journals, and other transformed traditional media.

Introduction

There can be no doubt that scholarly electronic publishing of all types plays an extremely important role in the academic world. Access to the Internet is nearly ubiquitous for scholars in North America and Europe. The network's role is crucial for everything from announcing conferences, distributing calls for papers, and publicizing preliminary conference programs and table of contents to researching, pre-printing, and publishing scholarly works. Scholars frequently subscribe to electronic journals, mailing lists, or network news discussions, and make use of the World Wide Web to retrieve current literature, news, and research. The Internet is a big part of academic life.

Scholarly publishing--perhaps we should prefer the term "scholarly communication"--is the primary means by which the outcome of academic work is shared (at least in modern times). Journal articles, books, conference proceedings, and the like have been the primary delivery vehicle for scholarly work. There is little doubt that the Internet will soon augment these print media as a means of delivery, and is indeed already doing so.

What is taking so long? Why are we not receiving our academic journals on the Web, by email, or in some other electronic form, instead of in print? Examples of electronic journals, conference proceedings, and books abound, yet these are in the minority (and are often of lesser quality) when compared to print publications.

There is no short answer to the question of "what is taking so long?" This paper will present parts of a longer answer, and attempt to estimate when the various components of scholarly electronic publishing will come into place. It is assumed without question that scholarly publishing will, by early in the millennium, take place in electronic forms. Whether this is "good" or "bad" is subject to debate elsewhere--it is submitted here that such a debate is comparable to debating whether automobiles or microwave ovens are good or bad. Scholarly publishing is. In the near future, scholarly publishing will be largely in electronic form.

There are many questions left unanswered here. For example, the Web is often viewed (especially by Internet neophytes) as synonymous to the Internet. Yet the Web will evolve and eventually be replaced. The nature of computing will change; new standards for data exchange and networking will be introduced; television and other media will merge with Internet media... It is very difficult to predict what scholarly publishing will look like in 20 years, but it is not nearly so hard to look at scholarly publishing of the late 1990s to determine what needs to change, what is changing, and what needs to be overcome to allow change.

Four major categories of challenges to the move towards electronification of scholarly publishing will be discussed in this section. Later sections will introduce details on components of the four categories.

One major area of challenge is the relative lack of standards for electronic publications. Web-based publications, electronic journals, mailing list contents, and so forth are difficult to retrieve due to the lack of controlled vocabulary and fields, such as are found in bibliographic databases (for example, Library of Congress Subject Headings, Title/Author fields, etc.). Indexing and searching tools on the Internet--the Internet search engines--are not able to distinguish the relative scholarly value of, for example, a 12-year old's page of favorite television shows and a media scholar's critique of the state of network broadcasts.

Similarly, the provisions for including basic information about a particular document (meta-information) are weak. Simply identifying the author and title is difficult to do automatically, as is getting information about the publication date and history. These characteristics are particularly evident on the Web, but are not made easier when publications are distributed by email or other means. SGML offers a method to include significant meta-information, but is not yet widely used in public Internet forums (In addition, the diversity of DTDs makes SGML problematic for standardization.)

A second area of challenge for electronic publishing is perceived legitimacy for the purposes of promotion and tenure. One of the motivations behind a great portion of scholarly publishing is the need of the authors to demonstrate the quality of their ideas through acceptance of written work in peer-reviewed journals. For every field, there is a hierarchy of journals with the best reputations, of which conferences are the most difficult to be accepted for, and of the academic or professional publishers with the strictest standards. Even for those electronic publications with strict peer review and a complete editorial board, these electronic journals, conferences, and books do not have the perceived status that print publications do.

The quality of electronic scholarly publications is also a problem. Quality can include issues such as the presentation, page layout, design, and graphical quality of articles, the peer review and editorial process, or the credentials of authors whose work is published.

A final main area of challenge is perceptions or models that academia has of scholarly electronic publishing. Even if issues of quality, legitimacy, and standards are met, the role of electronic (versus print) publications in academic life is based on perceptions of the academic community of that role. If ejournals are not perceived to have the same value for tenure decisions as print journals, then they will not have the same value. If conferences that only have electronic proceedings, not print proceedings, as not perceived as being of as high quality with those with print, then the perception will apply.

Specific instances associated with standards, legitimacy, quality, and perceptions will be discussed in the following sections, along with the prognosis for their being overcome. Overall, we can anticipate a multi-year transition towards an increased role for electronic publishing. There are, today, hundreds of examples of electronic journals, books, conference proceedings, etc., and millions of examples of Internet resources that are useful or play some role for academic work. In the future, we can anticipate that the term "scholarly publishing" will refer to materials in electronic form, with print used for specific subsidiary purposes such as archiving or appearing opulent. There are still many steps to be taken to reach this future, however.

Informal Communication

Network newsgroups, mailing lists, and Web pages are frequently used to share preliminary research results, discuss issues, and keep in touch with other scholars. The importance of these types of forums varies somewhat in different academic disciplines, but there can be no doubt that many individual scholars are able get important benefits from informal electronic communication.

Although books may be published on the Web, and electronic journals may be distributed by email, the largest current use of newsgroups, mailing lists, and Web pages is for content that is not yet ready to be published as a journal article, conference submission, or book. Such forums may be used for "skywriting" (Harnad, 1996), for pre-publication of results, and many other purposes.

Today, it is easy for scholars to distinguish between, for example, email discussion lists and print journals. Few scholars would be inclined to list the network newsgroups they read on their curriculum vitae, yet most would list every conference presentation or journal article. Although some grey areas exist, there is a fairly definite boundary between "communication" activities of scholars and their "publication" activities. (One notable grey area is that many ejournals publish materials such as short essays that might have also been suitable for distribution to public mailing lists.)

Several areas of change to informal scholarly communication are underway. The first is that archives of communication forums are frequently used as information stores. Archives of mailing lists, current newsgroup contents, and even (though less frequently) logs of IRC sessions or other interactive network forums are available for search or retrieval. This does not necessarily force a change in the communication that takes place in the forums, but it does change the means by which such forums might be accessed.

A second area of change to informal communication it somewhat less obvious, and has to do with gatekeeping and membership in the forums. Moderated newsgroups and mailing lists have been with us for some time, but private lists for scholars are seen less frequently. What we can anticipate is a more structured order for the ability to participate in or post to the most important informal communication forums. This stratification will be for purely pragmatic reasons: readers of the forums are frustrated when the level of discussion is limited by the frequent messages of newcomers, or when commentary is more likely to come from graduate students than from well-known scholars. Private mailing lists already exist, but the model of these lists being for private discussion among eminent scholars which may be observed by anyone interested is less frequently seen.

A final area of gradual change to informal scholarly communication is the means by which participation occurs. Currently, mailing lists have the feature of arriving in one's personal electronic mailbox. Network newsgroups, however, must be sought out by a separate news reading program. Electronic journals might arrive by email, be posted as Web pages, or made available in other formats. We can expect some shifting in how materials are distributed as search and retrieval techniques are refined. For example, we might anticipate that query-by-profile systems will identify and deliver materials of interest from mailing lists without a subscription to the lists. Another example is the use of unified front-ends for network news, email, and Web pages that we see in 1997's Web browsers.

Informal scholarly communication is greatly facilitated by the Internet. The current generation of new scholars might find it difficult to imagine times when meetings, conferences, letters, and telephone calls were the primary method of discussing and sharing academic discussion. To the extent that "weak ties" among scholars are the truly important ones for getting their work done, there is a great promise that continued enhancements to how we use the Internet for informal scholarly communication will prove tremendously empowering for all scholars.

The organization of information

Electronic library card catalogs, bibliographic databases, CDROMs, and other systems for information retrieval rely on fields for identifying different types of information, and on controlled vocabularies for subject indexing. The tools we use today for accessing the Web, email, electronic journals, etc. do not usually have these capabilities. Even when the meta-information about a particular document is present, there is no guarantee that automatic search engines or browsers will be able to access it correctly.

Standards for the communication of meta-information do exist, however. SGML may be used to tag author, title, and subject fields. Z39.50 is a bibliographic interchange standard that can allow multiple interfaces to access a database, such as a library card catalog (WAIS was based on an earlier implementation of Z39.50). Even with HTML, the META tag allows for the communication of fielded data.

The problem is not so much in the ability to include meta-information, as in the lack of an ability to use it effectively. Perhaps more important is the problem of people self-authoring their own materials on the Internet (for Web pages, email discussion groups, or even scholarly papers or conference proceedings) without knowledge of how to apply such meta-information.

The solution to this problem will likely come in the near term, through the tools we already use to access electronic information. New HTML tags are introduced frequently (the current META tag may be used to communicate author information), and TITLE already exists but is used more for a running heading than an actual document title. Other fields can be introduced, and search engines will be able to offer the capability to search on these fields. This will lead to problems of training people to use such fields effectively, but this is less of a problem for the academic community than the general public. Regardless, the fact that millions of computer users have overcome the difficulty in mastering such arcane skills as HTML, URLs, and email addressing gives hope that the public can learn to use features such as fields, authority lists, and query expansion and truncation effectively.

Information retrieval tools for full text exist, but do not usually perform very well except with trained searchers. While efforts are underway to develop more sophisticated means of dealing with full text (Harman, 1994), the greatest hope for the near term is to add capabilities to search network-based publications using existing types of IR systems.

Involvement of commercial publishers

Commercial publishers (and we might include academic presses in this category, for the purposes of this section) are in the business of creating products for sale. It has been demonstrated that the actual physical publication--the journal or book--accounts for only a portion of the costs of the publication process (see Fisher, 1996). Editing, reviewing, proofreading, publicizing, and many other activities are involved. In the case of commercial publishers, a goal is to profit from the income generated from the publications. Even in the academic press world, there is a necessity to strive to break even, if not profit.

Solutions to the needs of publishers to profit from their work on electronic publications are forthcoming, but have not yet emerged. A variety of economic models exist (see Newby, 1996), none of which are exactly matched to the type of one-item-one-fee approach amenable to books and journals.

The forthcoming solutions involve stronger emphasis on copyright, and creating forums for the distribution of published items on a per-use basis. Although subscriptions to book series and journals will still exist, we can anticipate a far greater role for pay-once-use-once schemes for accessing electronic publications. For example, a Web search might yield an abstract for a scholarly article. Someone seeking to read the article could provide payment, then get access to the article to read and perhaps print one copy. The publisher would thus expect to generate revenues for their products over a far longer period of time than they do currently. This is because current models for print publications involve getting a copy of a book, journal, etc. then using it in perpetuity. The publisher would sacrifice the one-time payment for the book, but then reap profits from its perpetual use.

Many forces on the Internet are working to assure the security of network-based transactions, where information or goods are delivered immediately based on interactive payment. Use of the Internet for commerce is already upon us, and the amount of commerce on the Internet will grow exponentially through at least the first years of the millennium. Publishers will be able to use the same mechanisms as any merchant.

A remaining problem of concern to publishers is the issue of copyright and piracy. Currently, there is little to prevent someone with a single electronic copy of, say, a journal article from distributing that article to her friends and colleagues without a charge. Publishers want to be able to insure they can get compensation for every copy, without fear of illegal duplication. Although past history with software, music, and even print publications demonstrates the difficulty of preventing piracy, every indication is that piracy will be getting far easier. For example, one impediment to my copying an entire electronic conference proceedings to my personal hard drive (and perhaps making copies for my friends) is the size of the files involved. But as the storage capacity on my home PC exceeds several gigabytes, and the ability to write CDROM becomes commonplace, the size of the files involved (and even the network bandwidth needed to retrieve them) will become trivial.

Publishers must work in several areas to overcome the difficulties of avoiding piracy. First, an effort must be made for authoritative sources to be easily and cheaply obtainable. If a pirated copy is easier and cheaper to get than the original, this will create a problem for publishers. Second, to help insure knowledge about copyright laws. Many individuals will prefer to do the 'legal' thing, but today's Internet offers plenty of evidence that most people do not understand the copyrighted status of electronic documents. Third, publishers must make their materials non-trivial to copy. This point is in conflict with current easy standards such as HTML, but fits reasonably well with Adobe PDF files and SGML. An example from the software world is the case of Microsoft Office on the Macintosh, where files are stored in at least 4 different locations on the computer, making it impossible to simply copy one directory to another computer to steal the software. Finally, and most importantly, publishers should strive to give reason to end-users to make use of their publications on an ongoing basis. This can be accomplished by embracing the dynamic capabilities of the electronic world: providing interactive forums for readers; updating publications on a frequent basis; being pro-active about developing publications based on interest in current publications, and so forth.

Editorial Structure

Print journals and conference proceedings of the mid-1990s involve entire teams of people. Editorial boards, layout experts, graphic designers, a reviewing corps, and so forth. At the same time, most electronic journals and conference proceedings are the work of only a few people; sometimes only one person. The great empowerment that the Internet plus modern computing tools offer to authors enables such electronic publications, but at the cost of some quality from having other people, with their expertise, involved.

There are only a handful of electronic journals that have editorial quality comparable to that of print publications. Yet it is the editorial board, the editor, and the publisher that helps to maintain the stature of leading print publications.

There is no quandary here, it seems: the definition of the "best" or "most important" publications is, and has been, based on the quality of the works they contain, the authors they attract, the editorial board they list, and the overall professional presentation of the publication. There is every reason to suspect this set of criteria applies regardless of whether the format of the publication is print or electronic. There is some doubt about whether publishers are a necessary component or not, but the print world has certainly demonstrated the value that publishers can add to scholarly publications.

The mission for creating "important" scholarly publications in electronic form is fairly clear, and some publications have already taken the necessary steps. Resolution of some of the other problems mentioned here will aid in progress towards the creation of electronic publications with the same editorial quality as print publications, but (as some key electronic journals demonstrate) there is no significant technical or social barrier to their creation today.

Longevity of Electronic Publications

The Internet has not yet been successful as an archival location for storage of publications (with few notable exceptions; see http://www.archive.org). On the Web, outdated material (such as announcements for last year's conference) can lead to the appearance that the site is not maintained properly--especially when Internet search engines lead directly to last year's conference, rather than to this year's, or the sponsoring organization's front page.

Only 50% or so of mailing lists and newsgroups are archived, and the archives are seldom perpetual. Rather, archives of last year's mailing list content might be deleted to make space for this year's archives. The cost of online storage is the culprit here--for even as disk drives get cheaper, the demands on system administrators for new mailing lists, more Web pages, and large disk quotas force continued diligence over allocation of resources.

In academic settings, there is typically an office for archives, or an archival library that's part of the main library. Modern archivists are well aware of the limitations of storage in electronic form, and only accept items such as floppy disks or magnetic tapes with the foreknowledge that these materials will be almost completely unreadable within just a few years. In the academic library setting, there is competition among budget items to acquire books and periodicals and develop computing facilities, in addition to general upkeep, salaries, etc. It does not seem likely that many libraries will be able to develop electronic archival capability (even for their own in-house materials).

At a typical college or university, a computing services office maintains campus-wide facilities for computing, networking, Web page storage, etc. Even in the universities that have appointed an "information czar"--a vice-chancellor or other highly-placed individual with joint responsibility for the library and the computing environment--it is unlikely for the computing services office to engage in active archival activities.

What we can expect, for the next few years, is a tremendous and ongoing--and permanent--loss of electronic materials. As individual faculty move on, or as old computers are retired, or policies shift, or this semester's classes start, the old Web pages, mailing list archives, newsgroup contents, and so forth will be removed. As a new version of an electronic book is authored, the old version will be purged. It will take years yet for the academic environment to adjust to the needs of identifying and permanently archiving electronic materials. This function seems destined for the library, yet the library is not yet ready. One important step to their readiness will begin shortly, when libraries start to acquire publications in electronic form. A few have taken steps in this direction by subscribing to and archiving mailing lists and electronic journals. The larger step will not occur until the library must pay the same large annual subscription fee for an electronic journal as it already does for a print journal, CDROM database, book, etc.

In the commercial world, we can forecast a brighter near-term future. Inasmuch as access to older materials is valuable, there will be database providers or other vendors who will maintain such access. Thus, we can imagine that issues of electronic journals that are commercialy publishered will remain available. There is still cause for concern, however: we know that out-of-print books still retain their copyright (at least for 75 years or so, depending on your country). Yet obtaining legal permission to reprint these out-of-print books, perhaps for a college seminar, is difficult and costly to obtain. Can we expect the same difficulties occurring with out-of-print electronic publications, where unusually large fees are levied for access to materials?

Luckily the role of scholarly commercial publishers will still be tightly bound with the need of scholars to have their work published for the purpose of obtaining tenure. We can expect some level of responsibility, then, on the part of the publishers to maintain permanent access to such works, even if a different fee structure applies for older materials.

Libraries can be expected to play their part in maintaining permanent access to materials they acquire (at least to the extent they currently do for print materials). However, they may be limited by the copyright or licensing constraints of the publisher. For example, it is current practice for many CDROM database vendors to require that all old copies of the CD be returned when a revision comes out, and that the library may not keep any copies after they cancel the subscription. In this case, the library is unable to retain access to materials except as provided by the vendor.

Legitimacy of Electronic Publications

As should be clear from the sections above, there are some good reasons why tenure review committees are not, largely, ready to accept electronic publications as having the same value as print publications. Apart from the editorial process and quality of the electronic publications, the main issue is simply that most current electronic publications do not have editorial boards with the same "big names" as leading journals do. Many are maintained by one or a few junior faculty, and many more encourage the publication of student papers or do not enforce peer review.

When, as is inevitable, the proportion and visibility of electronic scholarly publications shifts so that there is a far greater number of journals, books, conference proceedings, etc. that have the same indicators of high quality and respectability as current print publications do, there will be no further need to convince tenure review committees of their worth. It appears unlikely, however, that this shift will be accompanied by a wholesale power shift away from commercial publishers and faculty with tenure.

While there is adequate room, on the Internet, for all types of scholarly publishing activities, there is also a continued role for commercial and academic publishers. Even as the fee system, copyright laws and expectations, and publication process evolves to encompass new electronic media, the basic role of scholarly publication as a means towards achieving tenure will remain. Indeed, even in many current academic environments where the role of tenure is changing, there still exists the need for scholars to self-legitimize through publications, in order to maintain or increase their academic status.

In 1997, there is a tremendous demand for quality control in electronic information. The level of interest in the Internet expressed by corporations that already dominate Western media and communications makes clear that the obvious and easiest means of judging quality will be by source, not content. This is the same reason why public-access cable television is not popular, yet dreary situation comedies are--the glitter, the color, and the snappy patter that media corporations produce cannot be matched by a single creative individual with a camcorder.

Similarly, we can expect that brilliant scholarly publications will have difficulty reaching their widest audience unless they are published by an important publisher or written by an already important author. There is still plenty of room to bypass the major players in the scholarly publishing field (whomever they turn out to be), just as independent films can win awards and independent music labels can get mass-market airplay. The 80/20 rule still applies: 80% of the material we see will come from 20% of the sources. Current television, newspaper, and radio ownership is closer to a 99/1 rule, as fewer than 20 companies control 99 percent of the mass media in the United States in 1997. The democratic nature of the Internet, such that it is, combined with the specialized needs of the scholarly community, can give us hope that the ratio will be more favorable.

Good Signs

The overall picture presented here is one of some challenges, but considerable progress towards meeting those challenges. Perhaps the largest single force is the desire of scholars to participate in the electronification of scholarly publishing. It is in our best interest for our publications to be widely and instantly available, and to avoid at least some of the delays inherent in the print publication process. From the consumer end, what scholar or student has not found it more convenient or expedient to search the Internet for publications of interest, rather than the library card catalog?

There is no reason why scholars cannot list electronic publications on curriculum vitae, and, provided they are Internet users, no reason why members of tenure review committees cannot take them into account. Perhaps the names of the new electronic journals will not be familiar, but the sponsoring institutions, editors, reviewers, or other authors may be.

Academic and commercial scholarly publishers have been relatively slow to move wholesale to electronic format, but almost all are interested and have some active projects. The level of maturity found in transmission protocols such as HTTP, and the extent to which expectations for royalties and subscriptions are reasonable, seems to indicate that there is indeed little reason to hurry, lest the hurrying lead to poor products or lost profits.

The Internet as a whole, and the means we use to communicate, store, and transmit information, is not yet in a nearly finished state. There is every reason to suspect that the desktop computer of the near future is today's supercomputer; that today's T1 network connection is tomorrow's modem; that interactive graphics and displays of tomorrow will make today's VR games look like "pong." Even if problems of effective retrieval from full text databases prove difficult, we will be able to engineer current means for searching to work more effectively with electronic publications.

This work has attempted to paint a realistic picture of ongoing activities and some important challenges in the move towards the electronification of scholarly publishing. It is accepted at the outset that scholarly publishing as we know it will take place largely in electronic formats. The exact timing of this change is difficult to predict, as is the timing for overcoming specific challenges discussed here. On the whole, though, there are no problems that appear intractable, and enough interest in solving them from outside the academic world (media outlets; microcomputer vendors; database providers; banks...) that we can expect these problems to be solved fairly rapidly. New problems will arise, no doubt, and the road to scholarly publishing of 2010 or 2020 will be rocky. Even though the destination is unclear, the path for the upcoming few years is before us.

References

Fisher, Janet. 1996. Traditional Publishers and Electronic Journals. In Peek, Robin P. & Newby, Gregory B. (Eds.). Scholarly Publishing: The Electronic Frontier. Cambridge, Mass.: The MIT Press.

Harman, Donna. 1994. TREC-4 Proceedings. Gaithersburg, Maryland: National Institute of Science and Technology.

Harnad, Stevan. 1996. Implementing Peer Review on the Net: Scientific Quality Control in Scholarly Electronic Journals. In Peek, Robin P. & Newby, Gregory B. (Eds.). Scholarly Publishing: The Electronic Frontier. Cambridge, Mass.: The MIT Press.

Newby, Gregory B. 1996. Digital Library Models and Prospects. In Proceedings of the American Society for Information Science Mid-Year Meeting. Medford, New Jersey: Learned Information.