This guide provides an overview of archival collections datasets (archives as data) made available by UCSF Archives and Special Collections, including guidance for accessing and using such data as well as descriptions of both the form and content this data takes. Additionally, this guide includes references to other archival collection datasets of potential interest for health sciences research made available by other institutions or organizations, as well as an overview of digital methods that can be used to analyze archives as data.
What is archives as data?
“Archives as Data” refers to archival collection materials in digital form that can be shared, accessed, analyzed, and referenced as data. Using digital tools, researchers can work with archives as data to explore and evaluate characteristics of collection materials and analyze trends.
What can you do with archives as data?
Computational methods can be applied to archives as data to, for example, calculate word frequencies in text or visual characteristics in images, identify place, event, personal, or corporate names, propose common topics within or across textual corpora, or assign sentiments from language used in a subset of documents from a collection. In addition to addressing other analyses, archives as data can support research inquiry that surfaces previously unarticulated relationships between people, institutions, places, ideas, and events or that maps any of those across place and time.
Considerations for working with archives as data
It is important to note the ways in which digital content from archival collections may need to be prepared or processed before it can be worked with as data. While archives as data allow for useful and innovative digital analysis, it's important to be aware that source materials are likely to have been subject to some processing or other preparation to become a dataset. Some examples include: