BIBLIOGRAPHIC INTEGRATION AND THE INTERNET
Moving beyond "Cataloguing" in Academic and Research Libraries
Toby Burrows, Divisional Librarian (Technical Services), University of Western Australia, Nedlands 6907 W.A. (tburrows@library.uwa.edu.au)
(Paper presented at the 10th National Cataloguing Conference, Fremantle, W.A., November 1993)
Introduction
My theme today is the future, but I'd like to begin by asking you to cast your mind back over a thousand years - to medieval England in about the year 790. In that year, the eminent scholar Alcuin drew up a kind of catalogue of the books held in the library of the cathedral at York. Alcuin - as I'm sure you know - later became famous as the man who directed the intellectual activities of the Empire of Charlemagne.
The interesting thing about his catalogue was that it was written as part of a poem. I'll quote a little of it:
The writings of Victorinus and Boethius,
and the ancient historians Pompey and Pliny,
keen-minded Aristotle and Cicero the great rhetorician;
all the poetry of Sedulius and Juvencus... (1)
I mention this, not to suggest a poetic approach to cataloguing (interesting though that might be!), but to illustrate the antiquity of the activity that brings us together for this conference. The earliest surviving monastic library catalogues date from Alcuin's time. They are author/title listings and sometimes have a kind of subject arrangement as well.
In late Latin, the word "catalogus" meant "a list." And the compiling of lists of manuscripts was, I think, the first service offered by librarians which went beyond the simple collecting of materials into one place. Cataloguing, or listing, was the first "value-added" service offered by librarians, to use a very fashionable phrase. It added considerably to the value of having a library of manuscripts to be able to scan a list of the contents of the library.
A great deal has happened since Alcuin's time - even in cataloguing. In the nineteenth century, in particular, vastly increased numbers of publications poured off the newly invented rotary printing presses, overwhelming cataloguers and making the existing methods of cataloguing unworkable. Cataloguers employed at the British Museum by Sir Anthony Panizzi even took to drink in the face of these apparently insuperable difficulties. (2)
The response, led by people like Panizzi, was to develop the kinds of detailed cataloguing rules and codes with which we are so familiar. And yet the essential purpose of cataloguing remained the same: listing materials held in the library in such a way that authors, titles, and subjects were grouped together. Even now, despite such recent inventions as the on-line catalogue, the MARC record and the bibliographic utilities, we still catalogue within this same generic framework.
But we now have the good fortune, or misfortune, to be working in a time when these traditional certainties are being unravelled. What I would like to do today is to look beyond "cataloguing" as we know it, to the future of bibliographic work in the academic library and eventually all libraries.
The major agent of this unravelling is the Internet, the international network which links over 16,000 computer networks in sixty countries. The Internet provides access to a vast amount of material: library catalogues, bibliographic databases, files of documents, computer programmes, electronic mail groups, news groups, full-text databases, and electronic journals and newsletters.
Its recent growth has been remarkable. (3) In the last twelve months, network traffic has doubled and the number of constituent networks has risen by 150%. In the last two years, network traffic has grown by nearly 300%, and the number of constituent networks by nearly 400%. Traffic on the Internet has, in fact, risen over the last five years by no less than 12,000%! Even our own branch of the Internet - known as AARNet - is doubling in use every ten or eleven months.
The growing size and use of the Internet are increasingly affecting the academic world in a profound way.
I think we can now say with reasonable certainty that the Internet will eventually replace the printed journal as the primary means for communicating academic research. This may take some time to eventuate. But journal publishers are already testing ways of providing their journals electronically over campus networks. Elsevier has its TULIP project, and Springer has its Red Sage project. On a far more humble scale, at the University of Western Australia, the journal published by our Department of Education is now only available in electronic form. Even Westerly, that eminently literary magazine, will be publishing an electronic version next year.
More importantly for us and for librarians generally, computer networks are already close to becoming the dominant means of conveying bibliographic information. Catalogues, indexes and reference works of all kinds are increasingly consulted in their on-line, networked forms. Academics are rapidly discovering the bibliographic riches available through the Internet. As a small example of this, one of our academics is reported to prefer searching the MELVYL system at the University of California over the Internet, because he finds it more helpful than our own URICA system.
A crucial point in these developments is that academics can connect directly to these services. The Internet can be explored from the academic's desk, without the need for a librarian to mediate or intervene. In fact, the individual academic is being encouraged by organizations like ABN, OCLC and UnCover to use their services directly. This conjures up the vision of academics finding bibliographic information and getting documents delivered without using a local library at all.
The role of the academic library in this context requires urgent re-thinking. But it is clear that one major function will be to ensure that this direct access is effective, easy to use, and well-organized. Bibliographic structures will be a critical element in this.
BIBLIOGRAPHIC ACCESS AND THE INTERNET
On the Internet, a library catalogue (enhanced or unenhanced) is only one type among a range of bibliographic resources. So too is the expanded catalogue cum databse like Stanford University's FOLIO Information System or the University of California's MELVYL system. What is needed above all on the Internet is to integrate access to all the available services and data, including library catalogues.
I would make a distinction here between integration and connectivity. Connectivity is inherent in the structure of the Internet, which exists to link computers and networks from all over the world. There are various generic programmes which enable data to be transmitted across the Internet, such as electronic mail for sending messages, File Transfer Protocol (better known as FTP) for transferring files, and telnet for logging in to a remote computer to use its software. These are the applications which act as the basic connecting devices of the Internet.
But integrating the Internet is a different matter.There have been several attempts at integration over the last three years. They were designed by computer experts, not by librarians, and most are fairly simplistic and mechanical. (5) The three main tools are Archie, WAIS, and Gopher.
ARCHIE
The oldest of these retrieval tools is Archie, which became available in November 1990. (6) Archie indexes the names of over two million files at over twelve hundred sites around the world. It can be searched by using telnet to log in to an Archie server machine, such as the one maintained by AARNet. Or you can use what is called client software running on your own machine, which will connect to and "talk to" the Archie server. There is a Windows type of Archie client, for example, which is much easier to use than telnetting to Archie.
Archie has two major weaknesses. It can only be searched by exact or close matches between a search term and a file name. Searching by subject is extremely hit-or-miss, and depends entirely on how well a file's name matches your subject descriptor or the file's contents. Archie's other weakness is that it is limited to files. Other resources on the Internet - catalogues, databases, newsgroups - are not listed.
When Archie is used through telnet, there is no automatic link for retrieving a file which has been identified. (No document delivery service, in paper terms.) The user must know how to use the FTP process to connect to the site which has that file. The Windows client for Archie, however, allows this process to be invoked with a minimum of fuss.
There are some related but more experimental tools which are intended to help with organizing and retrieving files over the Internet. Two file systems called Alex and Prospero allow browsing through directories of files which are available for transfer. They can then be transferred using FTP. (7) Alex, in particular, can be integrated with Archie. But Prospero and Alex don't help with searching by subject for relevant files.
WAIS
Another important network retrieval tool is the Wide Area Information Server, known from its initials as WAIS. (8) WAIS is designed to allow searches of large full-text databases using uncontrolled keywords. It uses client software running on your local computer, in Macintosh or Windows format, to send queries to databases on remote servers which have been indexed by the WAIS software.
The server returns a list of documents which match the search, and ranks them according to the number of times the keywords occur. You can then refine the search further, if necessary, and search again. This process is called "relevance feedback." There are over three hundred WAIS servers on the Internet, offering everything from collections of poetry to weather maps. There is even a WAIS server which contains summaries of all the episodes of The Simpsons!
The attraction of WAIS is undoubtedly that the indexing is purely mechanical. Its "relevance feedback" searching is touted as somehow closer to human intuition than controlled indexing is. But debate has raged for years over the merits of controlled versus uncontrolled subject access, and there are strong arguments against the WAIS approach. If a query is too specific, you'll get no result. If it is too general or ambiguous, you are likely to get huge amounts of irrelevant junk!
Interesting efforts are being made to develop and extend the WAIS approach. One is called Dynamic WAIS. This allows other network tools like Archie to be searched with the WAIS software, using a gateway between a WAIS server and an Archie server.
Another important development involves using WAIS to search library catalogues and bibliographic databases. The Australian National University, for example, has made its on-line serials catalogue available experimentally as a WAIS server over the Internet. Rice University in Texas has made the ISI Current Contents database available through a WAIS server.
WAIS is based on something called the Z39.50 protocol. This is a software specification which allows a single local client to be used for searching remote catalogues and databases. Instead of having to learn to use all their different varieties of software, you can stick with the same software for everything - as long as the database at the other end is set up as a Z39.50 server. Several American university libraries are testing Z39.50 applications, and it is also being implemented by library system suppliers like DRA and Innovative Interfaces. One goal of the Z39.50 protocol is to permit multiple remote databases to be searched simultaneously, and this is already possible with the WAIS software.
GOPHER
The third major network retrieval tool is Gopher. Gopher differs from the other two in that it relies on browsing rather than searching, and uses a hierarchy of menus to provide access to network resources. (9) Although it is only two years old, Gopher is transforming the use of the Internet and is rapidly becoming the most popular Internet tool of all. There are 100 new Gopher servers coming on to the Internet each month at present.
Gopher uses client/server software like Archie and WAIS. By simply selecting or clicking on to a menu item, you can read documents and transfer files, without having to know where the items are physically located or how the connections are made. Gopher even connects to graphics files and sound files, though additional software is needed to display them properly. There are also links to applications which are not running under Gopher software, such as library catalogues, WAIS servers, and Archie.
Gopher is extremely simple to use, and integrates a huge range of menu items - at least one and a half million in all. But it has serious limitations. One is the depth and complexity of the menu structures; it can take many steps to get from a general menu to a specific item. The menus of Gopher servers are often poorly designed and unsystematic, and overlap with those of other Gopher servers. The descriptors for menu items are often vague and ambiguous. There is great redundancy, with many Gopher servers all pointing, in different ways, to copies of the same files and documents.
A new feature - called Veronica - was introduced twelve months ago, to try and assist voyagers in what is known as Gopherspace. Veronica allows you to do a keyword search of all Gopher menu titles in the world, and then to connect to the resulting items. Veronica tends to be very slow, and suffers from the same problems as WAIS searches. A newer version, known as Jughead (continuing the Archie comics connection), allows these searches to be limited to parts of Gopherspace.
There's another small Gopher enhancement, which I saw announced recently, that I can't resist mentioning. It's designed to take a list of the WAIS servers and organize it to appear as menu choices on a Gopher server. Its name, funnily enough, is Alcuin, which only goes to show that medieval scholars are among the denizens of Gopherspace! (10)
Gopher is clearly here to stay, despite its short-comings, and there are currently vigorous debates about improving it from a bibliographical perspective.There are two major issues being debated at present.
Firstly, what's the best way of organizing Gopher menus within a subject framework? Some people have been advocating the use of subject headings, either LCSH or various "home-made" schemes. Others think that classification schemes should be used. The ANU Library Gopher, for instance, uses the Library of Congress classification as the basis for its presentation of Internet resources by subject. There's an interesting experiment with classification schemes going on at Lund University in Sweden. They have written a programme which automatically classifies WAIS databases using UDC, and then groups them into a subject tree within their Gopher server.
The other major current issue is how to link library catalogues and bibliographic databases into the Gopher structure. Quite a few Gopher servers include gateways which connect to catalogues or databases using telnet. But there are several projects which go beyond this, and allow you to search a catalogue or database without leaving the Gopher framework. Rice University offers Current Contents and three other databases for searching from the Gopher, which acts as a client to WAIS server software. The University of Minnesota offers Current Contents in a different way, with each issue appearing as a Gopher menu item. Most interesting, though, is the BIBSYS database, a kind of Norwegian equivalent to ABN, containing over a million bibliographic records from 29 academic and research libraries. It can be searched directly using the Gopher software, which builds sub-menus in response to each search. (11) Experiments like these may well lead to a whole new approach to providing Internet access to library catalogues and bibliographic databases.
MOSAIC
As Gopher, WAIS and Archie show, there have been considerable efforts to integrate the resources available on the Internet. But these approaches remain quite rudimentary, particularly in view of the increasingly gigantic volume of material involved. There is a clear need now for a further level of integration, one which links all these methods in a more sophisticated way.
A possible candidate for this is a programme called Mosaic. (12) Mosaic is essentially client software for the network information retrieval system known as the World Wide Web. The Web is a distributed hypertext and hypermedia system, which spans and links the resources of the Internet, using as its base the same principles as the Macintosh HyperCard software. The Web is formed from a series of interlinked documents on local servers, including graphics and sound files. Each document has pointers embedded in it, linking it to other documents in the series. The Mosaic browser allows you to follow these links around the Web, in whatever order you choose.
The Web also contains gateways to Gopher, WAIS, news groups and Archie. And Mosaic can employ the keyword searches in WAIS and Archie, as well as using its own HyperCard type of approach.
Mosaic, like the Web, is based on a means of identifying and locating specific Internet resources using a Uniform Resource Locator, or URL. URLs are a way of providing unique addresses for all kinds of Internet resources, by specifying their type, host computer, directory, and name.
What Mosaic lacks - like the other Internet tools - is subject access as we know it from the library catalogue and many bibliographic databases: that is, a combination of controlled vocabulary, direct access to records by a subject term search, and human-assigned indexing.
FUTURE DIRECTIONS
This is probably an appropriate point to return to my central question: what is the future role of cataloguers and cataloguing in this world of the Internet? Is it, as a recent OCLC report recommended, to create MARC records in the catalogue for Internet resources? (13) I don't think so - at least not within the present on-line catalogues, where there can be no direct connection from the bibliographic record to the Internet resource itself.
The real task before us is to build an integrated bibliographic architecture for Internet resources, especially in the area of subject access. This is nicely summed up in the recent A.N.U. statement on directions in information technology for the next five years:
"The Library will add value to scholarly information, by organising the content of and access to such information available on the global network." (14)
This is as neat a statement of the future of cataloguing as one could wish for!
In more specific terms, this will involve at least six main lines of approach:
* first, developing a local library Gopher server, with a local menu structure and pointers to locally relevant resources;
* second, helping to promote and teach the use of the existing network retrieval tools;
* third, contributing to the development and application of new retrieval tools, especially those aimed at improving subject access to the Internet and at simultaneous searching of multiple databases;
* fourth, developing ways of integrating the local catalogue with Internet retrieval tools;
* fifth, constructing and developing local bibliographic databases for integration into the Internet;
* and sixth, finding ways of linking bibliographic records directly to their electronic texts.
Perhaps the most problematic of these tasks is the third. Any new retrieval tool will have to use the existing Internet architecture, which is based on mechanical indexing and the client/server model. A tool involving controlled, human-assigned indexing will be resisted by academic library administrators on the grounds of cost. "Cataloguing what we hold is expensive enough - how can we afford to catalogue things we don't even hold?" Persistent lobbying and persuasion will be needed if such attitudes are to be changed.
Commercial firms may end up doing the job for us, just as they provide the rich subject access of services like BIOSIS and PsycLIT. The high cost of access to these commercial services is the price we pay for their richness, however. There is already a commercial information service on the Internet. The Global Network Navigator, which was launched last month, uses Mosaic and World Wide Web and is funded by sponsorship and advertisements.
In the present climate for university libraries, it will be difficult to allocate or reallocate staff and funds to these network "knowledge management" activities. I feel it can, and should, be done.
These tasks are potentially of greater value and importance to the academic community than original cataloguing and reference desk duties. University libraries are accelerating their shift away from local ownership of printed materials, and researchers and publishers are accelerating their shift towards electronic, networked resources. There is a great challenge here to develop new forms of bibliographic access and organization, and cataloguers have the best credentials for this task. By grasping this opportunity, we can take the art of cataloguing into quite different realms.
___________________________________________________________
Notes
1. Alcuin, The bishops, kings, and saints of York, ed. Peter Godman (Oxford: Clarendon Press, 1982), p. 124-7.
2. Valauskas, Edward J. "One-stop Internet shopping: NCSA Mosaic on the Macintosh", Online 17(5) (Sept. 1993), p. 99-101.
3. Internet / NSFNET statistics are available via FTP from nic.merit.edu (directory: nsfnet/statistics), or via Gopher to: InterNIC Gopher, under "InterNIC Information Services".
4. Lynch, Clifford A. "Beyond the ordinary card catalog: MELVYL learns from years of experience", EDUCOM review 27(6) (Nov./Dec. 1992); Potter, William Gray, "Expanding the online catalog", Information technology and libraries 8 (1989), p. 99-104; Troll, Denise A. "Information technologies at Carnegie Mellon", Library administration and management 6 (1992), p. 91-9.
5. December, John, Internet tools summary (FTP from ftp.rpi.edu, file: pub/communications/internet-tools); Foster, Jill, George Brett and Peter Deutsch, A status report on networked information retrieval: tools and groups (FTP from mailbase.ac.uk, file: pub/nir/nir.status.report); Krol, Ed, The whole Internet user's guide and catalog (Sebastopol, Calif.: O'Reilly & Associates, 1992).
6. Simmonds, Curtis, "Searching Internet archive sites with Archie", Online 17(2) (Mar. 1993), p. 50-55.
7. Schwartz, Michael F., et al., "A comparison of Internet resource discovery approaches", Computing systems 5 (1992), p. 461-493.
8. Stein, Richard Marlon, "Browsing through terabytes", Byte 16(5) (May 1991), p. 157-164; Kahle, Brewster, and Art Medlar, "An information system for corporate users: Wide Area Information Servers", Online 15(5) (Sept. 1991), p. 56-60; Dern, Daniel P., "Index everything, share it companywide with WAIS", MacWeek 6(38) (26 Oct. 1992), p. 24.
9. Notess, Greg R., "Using Gophers to burrow through the Internet", Online 17(3) (May 1993), p. 100-102.
10. Morgan, Eric Lease, "Alcuin: organizing WAIS indexes" (message posted to newsgroup PACS-L (Public Access Computer Systems Forum), 4 May 1993).
11. Tennant, Roy, "Gopher access to large bibliographic databases" (message posted to list GO4LIB-L (Library Gopher List), 23 Sept. 1993).
12. Valauskas, op. cit.
13. Dillon, Martin, et al., Assessing information on the Internet: toward providing library services for computer-mediated communication (Dublin, Ohio: OCLC, 1993). See also: Caplan, Priscilla, "Cataloging Internet resources", Public access computer systems review 4(2) (1993), p. 61-66.
14. Information technology directions statement, 1993-1997 (Canberra: Australian National University, 1993), p. 11. See also: Weider, Chris, and Peter Deutsch, A vision of an integrated Internet information service (Internet-draft, IETF IIIR Working Group) (FTP from ds.internic.net, file: internet-drafts/draft-ietf-iiir-vision-00.txt), section 4.2.