Digitizing Patient Information and Laboratory Research Data for Archival Reference and Research

Nancy McCall, Lisa A. Mix, and Anne J. Gilliland-Swetland, Investigators

Progress Report - 22 January 1996

McCall and Mix

Case Studies

Contents of this Report

1. Project Design

2. Report of Progress

3. Products

4. Plan of Work and Schedule

1. Project Design

As presented in the original fellowship proposal (January 1995), we have designed two case studies involving documentation in the health fields from the first half of the twentieth century. One study concentrates upon clinical records; the second study focuses on observational and experimental records from the health sciences. Our objective is to examine common issues (conceptual, technical, legal, economical, and ethical) in the digitization and Internet communication of these two types of documentation. The ultimate goal of the case studies is to develop electronic models for reference and research use of clinical records and records from the health sciences. We aim to produce generalizable models that may be adapted by other archival programs with documentation from the health fields. For the purpose of the case studies, we have focused upon the following two collections of documentation from Johns Hopkins:

a. The Patient Records of the Brady Urological Institute ( BUI ) of The Johns Hopkins Hospital (1915 - 1975) - Dr. Patrick Walsh, Director of the Brady Urological Institute, and Dr. Steven Docimo, Assistant Professor of Pediatric Urology, assisted us in the selection of record samples to bring to Ann Arbor to study for the month of July. They recommended that we select records of patients who had been diagnosed with posterior urethral valves (a serious congenital condition which presents either in infancy or later in adolesence). Their choice of diagnostic entity was based upon the historical importance of the early cases that were recorded (1915 - 1975) and the ongoing clinical interest in the diagnosis and treatment of posterior urethral valve conditions at Johns Hopkins. Urologists from Johns Hopkins were pioneers in the diagnosis and early treatment of posterior valve conditions; their publications of these cases are regarded as classics in the literature of pediatric urology. We located the 30 patient files that were listed in the BUI diagnostic index and shipped these to Ann Arbor.

b. Experimental and Observational Data from the Psychobiology Laboratory of The Johns Hopkins University School of Medicine (1920 - 1975) - Dr. James Wirth, Director of the Eating and Weight Disorders Program, and Dr. Timothy Moran, Professor of Psychiatry, assisted us in the selection of record samples to bring to Ann Arbor for the month of July. We concentrated on locating records from key areas of research that Curt Richter conducted while directing the Psychobiology Laboratory (1920 1975). The confluence in Curt Richter of a creative investigator who founded several fields of current research, kept easily interpretable records, and published only a part of his decades of research, makes the records a source of interest not only to scientists and clinicians, but also to historians and other social scientists. We chose record samples from the following areas of research: a natural history of the grasp reflex; periodic phenomena in animals and man; and neuro-endocrine study of spontaneous gross-bodily activity or energy. For the purpose of the case study, we chose a broad range of record types (logbooks; activity charts; Esterline-Angus charts; and photographs of equipment, laboratory staff, and research subjects) to ship to Ann Arbor.

2. Report of Progress

We report the following developments and findings from our first six months of research (July 1995 - January 1996):

a. Conceptual development - During July 1995 we conducted intensive research on issues regarding access, ethics, and legal concerns. We also conducted appraisal studies of the two sets of records, in order to determine whether they were appropriate for digitization and Internet communication. Appraisal studies involved application of current appraisal theory to the records at hand; analysis of citation patterns to determine the relevance of the records to current research; consultation with subject experts. Since our return from Ann Arbor, we have continued our consultation with colleagues, both at Johns Hopkins and elsewhere, to determine the intellectual value of the records, and to plan strategies for making these records accessible in a meaningful way.

From our research we have concluded that any digitization plan must be driven by the intellectual content of the records. Moreover, certain discipline-specific aspects of the record will influence plans for digitization. While physical and technological obstacles are very real, the evidential and informational importance of the records must carry a heavier weight on the appraisal scale.

b. Technical trials - Clinical and scientific documents from the health fields (1915 - 1975) pose special challenges for processes of digitization (e.g., non-standard sizes; barely legible entries in faint ink and lead pencil; color coding; and data that require the preservation of numeric inferences). Preliminary experiments in digitizing a selection of scientific and clinical documents indicate that specialized processes are required to image documents in non-standard sizes, faint entries, and color coding. Moreover, to preserve numeric inferences in clinical and research logs, entries must be keyed (not scanned) into a data base.

Document preparation and quality controls

Scanning and data entry processes for clinical and scientific documentation require considerable preparation and stringent quality controls to assure fine resolution of images and reliable entry of data. Preparation of documents and quality controls involve a varied and highly specialized workforce (e.g., conservators, domain experts, photographers, scanning technicians, data base designers, archivists, librarians and other information specialists). Conservators should be consulted to advise and, in some instances, to assist in document preparation (cleaning, flattening, removal of staples and clips, mending, and other processes to assure that documents are in optimum condition for scanning and digital photography).

Whereas the documents from the Brady Urological Institute were in good physical condition and in excellent intellectual order, preparations for scanning involved minor yet routine procedures (removal of staples and paper clips; surface cleaning of dust and carbon smudges). However, preparations for scanning documents from the Psychobiology Laboratory were far more extensive and involved treatment by a conservator. The documents which had been contaminated by lead paint dust were a health hazard and the content was obscured by surface dust. Lead particles had become imbedded in the porous composition of the paper records and could not be completely abated with surface cleaning. As a safety precaution, our occupational health advisors recommended that any records selected for the case study be encapsulated. Franklin Mowery, directory of conservation for the Folger Shakespeare Library, agreed to encapsulate the records selected for the case study. Our occupational health advisor from The Johns Hopkins University School of Hygiene and Public Health recommended safety procedures for Mowery to follow in the encapsulation process. Mowery also did additional cleaning and mended tears so that the records would be in optimum condition for imaging processes.

Imaging processes and data entry

We shipped our selected samples of clinical and scientific records to the Historical Center for the Health Sciences (HCHS) for digital experimentation. Because many of the scientific samples were large documents in non-standard sizes, we could not image these oversize materials with HCHS's equipment. Since we could not locate a scanner large enough to accommodate these documents or a working digital camera, we opted to photograph the documents in analog 35 mm film and then to scan the photographic images onto a compact disk. The result was quite disappointing because the images lost resolution and detail in the reduction to 35 mm. Various digital experts have since advised that we conduct additional tests on these records with a large flatbed scanner and a digital camera that can produce 4" by 5" negatives. We are scheduling additional tests with vendors in the next two months.

Since most of the clinical samples were standard letter (8.5" by 11") and legal (8.5" by 14") sizes, we were able to use HCHS's desktop scanner to image a selection of clinical documents. Because the documents contained fading script in ink and pencil, color coding, rust marks from metal clips, tearing and abrasions from staples and clips, we had to pre-scan images and adjust contrasts and colors to produce high quality images. Although the process for scanning clinical records was labor-intensive, we were quite pleased with the overall quality of the images.

The results of the digital trials that we conducted in Ann Arbor and follow-up consults with imaging and data entry specialists at the National Library of Medicine, the Library of Congress, and Sociometrics Inc., have lead us to recognize the importance of pre-testing a range of digital processes. At this point in the development of digital technologies so many variables exist in the design and performance of equipment. Pre-tests are especially important with projects that involve documents in non-standard sizes and documents with complex content issues (e.g., fading script, intricate graphics, color coding). Various types of scanners and digital cameras should be tested to determine which process best suits the materials under consideration.

Tests should also be conducted for data entry by submitting samples of data to be keyed to data entry services. In the health fields a wide range of discipline-specific data entry services exist. For instance some firms specialize in the entry of clinical data and information while others specialize in the entry of scientific data and information. It is important to channel discipline-specific samples to the appropriate data entry services. Samples should be triple-keyed to obtain the highest possible accuracy (99% error-free). The quality of results should help determine which data entry service to select.

Database design

We are currently studying issues associated with access, retrieval, and use of documents from the scientific and clinical samples. With the collaboration of domain experts from the School of Medicine and data base specialists from the Genome Data Base and the Welch Medical Library, we are projecting database designs for the scientific and clinical samples. Clinical advisors want to have rapid access from the diagnostic index to the patient records. Therefore, we are planning to link the diagnostic index to samples of patient records. Ultimately, we hope to have a web-like interface, allowing clinicians to go from the diagnostic index to an image of the record. Scientific advisors want to be able to manipulate data from laboratory logs in order to study the results of experiments, and to have a computer program so that the logs may be graphed and compared with images of the original hand charted samples.

In aiming to please prospective scientific and clinical users, we in turn hope to create databases that will reflect the internal needs of specific disciplines and also be accessible to outside users (e.g, social scientists and humanists) who would be studying the respective disciplines. We also intend to test the data bases with a range of projected users, including archivists, humanists, and social scientists. Domain experts should be asked to elucidate the content of documents; to advise ways that data and information may be accessed and utilized for reference and research in specific disciplines; and to test models for reference and research that are developed. Data bases should be designed around access and usability issues. Sample data bases should be tested for functionality by domain experts, information specialists, and a sample pool of users.

Internet communication

From the beginning of the case studies, our goal has been to develop models for disseminating archival documentation from the health fields on the Internet. After conferring with colleagues at Johns Hopkins and the National Library of Medicine, we have decided to add a closed, password-protected site for the case studies on the WWW site for the Medical Archives. We have learned that closed sites are being used in the health fields for the exchange of sensitive clinical information. The site will be secured with Basic ByPassword Authentication access. We will make the web site accessible to those individuals working with us on the case studies. This site will also be used for a conference to be held this summer. Summary descriptions of the project, and other nonsensitive information, will continue to be openly available on the Medical Archives web site.

c. Financial analysis - Because our digital trials were especially labor-intensive with extensive document preparation and stringent, time-consuming quality controls, they were quite costly to conduct. The major cost factors of digital projects, like the major cost factors of microfilming projects, are concentrated in the labor of document preparation and quality assurance procedures. The technology itself is not that costly.

Since we had to out-source most of the document preparation procedures and scanning for the scientific samples, we were able to keep an accurate assessment of costs. Charges for cleaning and encapsulation was $25. per document; charges for photography and scanning were approximately $25. per document. With a total cost of $50. per document, it would cost a million dollars to digitize the 20,000 charts in the Psychobiology Laboratory collection. This estimate does not include the development of a data base nor the data entry of the laboratory log books. While there may be ways to reduce some of the overall costs, it would still be very expensive to digitize a collection that requires such extensive document preparation and quality controls.

Because we ourselves did the document preparation and scanning of the clinical files, we cannot account for a true estimate of costs. A considerable amount of our time involved learning scanning procedures and working out a common plan for document preparation.

Because of prohibitively high costs that would be involved in digitizing entire collections of scientific and clinical documentation, most repositories will only be able to afford to digitize small selections from their collections. Developing appraisal criteria for scientific and clinical collections, therefore, involves cost-benefit analysis.

d. Copyright and intellectual property issues - While we have done extensive reading in the areas of copyright and intellectual property law and started a bibliography on these subjects while reading at the University of Michigan Law Library in July, we have not yet developed models for copyright and intellectual property rights in clinical and scientific records. Over the next six months we intend to work on legal property issues with Johns Hopkins counsel and other legal experts.

e. Ethical implications - As we proceed with the case studies new ethical issues emerge. For instance, the allocation of resources for costly digital projects may become a major ethical issue for many repositories. Documents involving patients and research subjects (human and animal) cannot be adequately protected on an open WWW site. Moreover, searching for and deleting personal identifiers is labor-intensive and, therefore, very costly. A major drawback of the deletion process is that removal of personal identifiers threatens to distort the context of data and information in documents. As we establish a closed WWW site for the dissemination of information about the case studies, we must investigate how closed WWW sites are being managed in the health fields to develop future policy regarding access and use of clinical and scientific documentation on the Internet.

3. Products

To date we have produced or coordinated the development of the following products:

4. Plan of Work and Schedule for 1996

The findings of the first five months of our Bentley project guide the following research plan for 1996:

a. Solicitation of vendors to find the best quality and least costly process for digitizing over size documents, and documents with color coding and notations in pencil.

b. Work with conservators and quality control experts to expedite digitization processes and to improve quality assurance.

c. Develop a closed WWW site for the clinical and scientific case studies rather than concentrate on having an open WWW site with personal identifiers deleted. Only those individuals participating in the Bentley project would have access to the closed WWW site.

d. Involve a broad range of clinicians, scientists, social scientists, humanists, and archivists in developing appraisal criteria for the selection of materials to be digitized. We plan to hold several Internet conferences on a WWW site established for this purpose.

e. July 1996: During the first two weeks we will conduct a conference via the world wide web; participants would include those listed above. McCall and Mix will spend the third week assimilating the information from the WWW conference, in preparation for the final week in Ann Arbor. During the last week in July we plan to meet in Ann Arbor to draft our final report and to prepare a manuscript for submission to a journal.

Return to project introduction
Return to Archives homepage