Return to main
Return to reviews

Publications and Bibliographies

shield

ICPSR Web Site (http://www.icpsr.umich.edu)

Reviewed by Royce Kurtz, University of Mississippi Libraries, September 23, 1997

Introduction:

The Inter-university Consortium for Political and Social Research (ICPSR) is a nonprofit archive for machine-readable data related to social science research. Founded in 1962, ICPSR is the largest archive of such data in the United States. As a nonprofit consortium, it provides data and services to over 350 academic institutions. The archive is a subdivision of the Institute for Social Research (ISR) at thn of the Institute for Social Research (ISR) at the University of Michigan. ISR is nationally known for conducting extensive, long-running surveys, such as the Panel of Income Dynamics, Monitoring the Future, and the Study of American Families. Data sets compiled by ISR as well as those acquired from U.S. government agencies, private foundations, international organizations, and individual researchers form the basis of ICPSR's collection of 4,200 data sets. The archive adds about 200 titles a year. Access to the titles in the archive has been through an annually published Guide to Resources and Services, now 915 pages (1996-97 last paper edition). The Guide is updated by a quarterly bulletin. In 1993 ICPSR launched a Web site which essentially functions as an electronic version of the Guide and the quarterly bulletins as well as providing the ability to download code books and data sets.

Scope and Coverage:

The archive has few guidelines governing the scope and coverage of its collection. Guidelines for depositing data deal not with subject appropriateness but technical issues. Besides the statement in the Guide that its "holdings cover a broad range of disciplines, including political science, sociology, demography, economics, history, education, gerontology, criminal justice, public health, forcriminal justice, public health, foreign policy, and law," and that "ICPSR encourages social scientists in all fields to contribute," no other statement defines the archives collection policy.

Individual scholars contribute data sets to the ICPSR archives. Foundations and agencies contract with ICPSR to archive and distribute their data sets, e.g., the National Archive of Computerized Data on Aging funded by the National Institute on Aging. Public health care data collected under grants from the Robert Wood Johnson Foundation Archive are made available through ICPSR. In 1997 ICPSR was awarded a contract by the Substance Abuse and Mental Health Service Administration to establish a National Archive and Analytical Center for Alcohol, Drug Use, and Mental Health Data. They are also creating an International Archive of Education Data sponsored by the National Center for Education Statistics. ICPSR also purchases data sets from a number of sources.

ICPSR is committed to acquiring and maintaining new versions of around one hundred serial data sets. These data sets include those produced by private agencies or individuals, such as the ABC News/Washington Post polls; U.S. government agencies, such as the Census Bureau's Current Population Surveys ; international organizations, such as the International Monetary Fund's Government Finance Statistics; and other organizations, such as the Center for Political Studies' National Election Studies.

The overwhelming number of data sets focus on U.S. social issues. Less than 15% of the archive's holdings are titles concerned with international or foreign--mostly European--studies.

Format:

The ICPSR home page has links to a "Table of Contents" and ICPSR's parent organizations. There is also a header, which is reduplicated on successive Web pages for easy maneuverability, with links to the archive, major subsets of the archive, ICPSR's summer programs, other Web sites, and contacts for help.

The Table of Contents screen provides an outline of ICPSR's Web site; it has links to six major sections, each with numerous subsections. The first three sections, "About ICPSR," "Membership," and "Governance," describe ICPSR's organization. "About ICPSR" includes a link to contact people categorized by their job description with phone number and e-mail address and a link to the full text of recent ICPSR newsletters atext of recent ICPSR newsletters and bulletins.

The other three major sections are "Archive," "Other Resources," and "Recent Developments." "Recent Developments" has links to the General Social Survey and American National Election Survey sites, Eurobarometer mail list, a list of available CD-ROMs, and recent policy statements. "Other Resources" has more policy statements, summer program schedules and course descriptions, a link to the Publication Related Archive, and "Other Data Sites."

The key section, "Archive," subsection "Access Holdings," is the electronic version of the Guide and the main access to the data collections. "Archive" also links to three subsets of the archive: "Recent Additions-Updates," "Data on Aging," and "Criminal Justice Data," as well as a "Documentation List" showing price and availability of code books in electronic or paper format, and more policy statements.

Record Structure of Data Archive:

The first link under "Archive," labeled "Access Holdings," leads to a page titled "ICPSR Data Archive." This page allows a researcher to keyword search the archive, browse using eighteen main subject headings, g eighteen main subject headings, or search by Title, Principal Investigator (PI), and Study Number indexes. A main subject division link leads in turn to subheadings. The final result of a keyword or browse search is a neatly framed alphabetic list of appropriate titles prefaced by AB for "Abstract" and DA for "Data." AB links to a long citation format that includes a detailed abstract. DA links to a page that represents a powerful research feature of this Web site, the ability to immediately download code books and data sets. If available electronically, the page displays the phrase "Codebooks and Documentation Freely Available." The code book can often be keyword searched from this screen. The search results display the line number and the phrase on that line in which the keyword is found. The ability to freely download or search the code book on the Web is a powerful research tool. Researchers may easily determine if data sets contain answers data sets contain answers to certain questions and how those questions and responses are framed, thus determining quickly and accurately the usefulness of a data set. Below the code book information, the "Data Files" are neatly framed and presented. In most instances the message "Access restricted to authorized users" is displayed, but for member institutions or for those publicly accessible data sets retrieval can proceed.

The format and arrangement of a data set citation on the AB Web page follows the paper Guide. Each data set entry has a field for principal investigator or investigators, a descriptive title, a unique ICPSR study number, and a summary or abstract of the data set that may be several paragraphs long. Other descriptive fields in the entry include technical features, such as universe, sample size, data format, number of cases, and records per case. Each of these fields, prefaced by a word or phrase "field identifier," links to a full explanation of that identifier. Another detail coded into each record is the amount of processing or editing done to the data and code books by the ICPSR staff. Data sets may be reorganized and the code books fully revised, or no editing may take place at all. Finally, a list of "related publications" cites important publications based on the data set. This standard format hase data set. This standard format has been modified slightly through the years, but older entries have generally not been changed to reflect new standards or editing styles.

ICPSR has separate Web pages for "Data on Aging" and "Criminal Justice Data." These pages are arranged like the main "ICPSR Data Archive" page and have the same keyword searching features.

Subject Access to Data Archive:

On the "ICPSR Data Archive" page, data set citations are organized into eighteen broad subject categories, such as "Census Enumerations," "Health Care and Health Facilities," and "Geography and Environment." Major subject headings may have several subheadings. A common subheading identifies data sets from "Nations Other than the United States." Data set citations appear under only one heading.

A simple keyword search engine is found on the "ICPSR Data Archive" page. The search engine allows a search via pull-down menu bar in title, investigator, or abstract fields, but not in combination. A second pull-down menu allows the selection of a "word" or "string" search. A "Help" link produces a screen with five short instructions: searches are not case sensitive; the boolean opere not case sensitive; the boolean operator "and" is assumed between words; quoted queries are treated as single terms; word searching, which requires an exact match, works only for abstract searching; and string searching is a truncation device. For example, the string search "vot" gets vote, voting, votes, and other variations. The search engine does not support the boolean operator "or" or nesting of terms with parentheses. This presents minor problems in searching because there is no thesaurus or controlled vocabulary. One cannot use "or" for near synonymous words like voting and election. The string search--minorit vot--produced eighteen hits; the search--minorit election--produced fifteen hits.

Time Lag:

Updates of old titles and announcements of new titles are accessible through a link on the "ICPSR Data Archive" page. As of the date for this review (September 23, 1997), "Recent Additions" listed 22 titles added between July 29 and September 19, 1997. The data sets were compiled between 1970 and 1996, with 1994 being the median. Researchers and organizations gathering data want to make use of their information before passing it on for general use, thus delaying release dates. ICPSR then performs various checks on the data and code books, reformatting many of them for general use, which also delays the releeral use, which also delays the release of data. As the editing process can be lengthy, ICPSR has established a FastTrack service, linked under "Electronic Services" on the "Contents" page, in which a select number of new studies are made available only through anonymous FTP before ICPSR has edited them. These studies are not cross-listed in the "ICPSR Data Archive." As of September 23, 1997, twelve studies were available through the FastTrack service.

Editing:

ICPSR has kept its Web page clean and simple. Bold type, underlining, and type size are used effectively to highlight and organize each page. The utilization of framing to present lists of data set titles, downloadable data sets, and code book options is simple and effective. As much of the text was part of the paper Guide, years of editing have minimized errors. The layering of screens follows a reasonable hierarchy from general to specific. Headers with links allow the easy return to various levels of the Web site.

Document Availability:

Most data sets are available only to paying members of the consortium, but a growing number are available through anonymous FTP. The Criminal Justice Data, the Publication Related Archive, the FastTrack, and a small partFastTrack, and a small part of the Aging Data are available to nonmembers. However, code books are not always available electronically, and a data set without a code book is useless. Obversely, a growing number of electronic code books are available for downloading or for searching over the Web, where the data are often not freely available. The options available to members and nonmembers in terms of data and code book retrieval are always clearly presented.

Cost:

All data sets in the archive are available to members upon the payment of annual dues which are based on the institution's enrollment size and the highest relevant degree awarded. Institutional annual dues for 1997-1998 ranged from $2,000 to $10,350 depending on the membership category. Nonmembers may buy data sets, but they are relatively expensive.

Comparison with Other Web Sites: The ability to keyword search a large data archive and then immediately retrieve the code book and the related data set is a major strength of the ICPSR. Several other Web pages provide similar or complementary features. The University of California at San Diego's Data on the Net (odwin.ucsd.edu/idata/ )is a well-organized index to Web sites worldwide that provide access to catworldwide that provide access to catalogues and/or data sets. Columbia University's Electronic Data Service (www.columbia.edu/acis/eds/) actually provides a detailed, menu-driven search engine for its data library which includes a search of the ICPSR Archive. The national data archives of several European countries have placed their catalogues on the Web, and the Council of European Social Science Data Archives (CESSDA) provides a Web site (www.nsd.uib.no/cessda/IDC) that searches ten of these catalogues using a detailed menu-driven search engine.

Positive Aspects:

The ICPSR Web page truly represents the potential of the Web to facilitate quantitative social science research. Structurally the site is simple with an intuitive hierarchical layering of screens from general subject categories to subsets of an individual data set. The ability, without an ihe ability, without an intermediary, to order any one of over four thousand data sets for immediate use is indeed a major advance. The accessibility of code books for retrieval through downloading or interactive Web searching increases the researcher's ability to easily and quickly determine the utility of any data set.

Recommendations for Improvement:

The ICPSR needs to reorganize its Web "Table of Contents" to better align sections and subsections. The section, "Other Resources," is really a miscellaneous category; its subsections, "Web Services Policy," "Computer Assistance," and "Electronic Services," are policy statements not "Other Resources." The American National Election Study's "ANES Subsets and Stats" site should be put under the "Archive," and "FastTrack" should also be a featured item under "Archive." In other words, pages that access data should all be under one section, and policy statements should be under another. The keyword search engine is relatively simple. With only 4,200 data sets this is currently not a serious issue, but the ability to "or" and nest using parentheses will quickly become desirable search tools as the archive grows.