- The Design for Diversity Learning Toolkit - https://des4div.library.northeastern.edu -

Digitizing and Enhancing Description Across Collections to Make African American Materials More Discoverable on Umbra Search African American History / Dorothy Berry

This case study describes a project undertaken at the University of Minnesota Libraries to digitize materials related to African American materials across the Universities holdings, and to highlight materials that are otherwise undiscoverable in existing archival collections. It explores how historical and current archival practices marginalize material relevant to African American history and culture, and how a mass digitization process can attempt to highlight and re-aggregate those materials. The details of the aggregation process — e.g. the need to use standardized vocabularies to increase aggregation even when those standardized vocabularies privilege majority representation — also reveal important issues in mass digitization and aggregation projects involving the history of marginalized groups.

by Dorothy Berry, Digital Collections Program Manager, Houghton Library at Harvard University

Problem and Context Statement

In 2016 University of Minnesota Libraries staff began the work of digitizing materials related to African Americans’ history, as part of a Council on Library and Information Resources (CLIR) funded grant “Digitizing African American Archival Materials Across University of Minnesota Collections.” The project was designed to increase the body of materials available on Umbra Search African American History, a widget and search tool hosted by the Libraries to aggregate African American related materials from digital collections nationally, as well as to serve as an example of the sort of digitization work that can be done by large collecting institutions to share the hidden depths of diverse holdings. A project manager was hired to manage the description work, rights assessments, and digitization workflow, under the management of the head of Digital Libraries Services and the grant’s Principal Investigator, who is also the PI for Umbra Search.

It is important to note that the majority of materials identified for this project were not from collections clearly marked “African American History” or anything of the like. Thousands of records were identified across over one hundred collections in units with development policies that are not at all related to race or Black history. Other institutions that may think themselves excluded from such a project may find they too have a wealth of relevant materials that have simply never been pulled together due to the nature of their original order.

The project set out to digitize an estimated 500,000 files at the folder record level, located in 74 different collections, producing around 10,000 new records. For example, a folder titled “African American Social Group: Minneapolis” might contain 50 sheets of paper, which when scanned front and back create 100 digital files- one new record, with 100 new files. The digitization workflow was envisioned as holistic, including assessing and confirming the rights required to put documents online, enhancing the existing folder level description to make clear the relevance to African American history, digitizing each folder in full, and uploading the materials to UMedia, the University of Minnesota’s institutional repository, to then be harvested by the Digital Public Library of America and then Umbra Search in turn. The digital records would also be linked on ArchiveSpace, the University of Minnesota’s CMS.

The impetus for this project was twofold. The first was to increase the representative holdings from Umbra Search’s host institution. The second was to test run a mass digitization of African American materials, and hopefully document a process that could be repeated by other universities and institutions with similar infrastructures. Both of these goals would serve two dual roles of increasing Umbra Search’s functionality, as well as generally increasing the amount of available African American materials in digital collections.

Descriptive Analysis

Before the Project Manager (PM) was brought on, some preparatory work had been done to identify relevant collections, leading to the initial estimate of 500,000 files. Upon initial PM assessment, however, it became clear that large amounts of the identified materials would have to be excluded from digitization due to irreconcilable rights issues. The issues around rights are less connected, in this situation, to materials being related to marginalized communities, and more connected to the transition into digital collections. When most of these materials were originally collected, the concern was having physical access to the materials. Early deeds of gift and chains of custody are often nebulous when it comes to intellectual rights, and often were conceived before digital collections as we know them were even imaginable. The preparatory work around this specific project was based solely on identification, and the labor was not done by someone with any background on rights as they relate to archival and digital collections.

The subtraction of those materials brought the identified, useable number down to around 200,000. In order to increase our numbers closer to original estimates, and to make sure as much relevant content was included as possible, a new search methodology was established. A list of potentially relevant words used across time to refer to African Americans and their history was devised and each collection finding aid available online was individually searched.1 [1] This systematic approach unearthed around 300 previously unidentified folders in an additional 76 collections.

The project workflow had quite a few moving parts from start to finish. Boxes were requested from a single curator and collection at a time, until a discrete collection’s identified materials were completely digitized. Once boxes were delivered to the PM folders were marked for digitization physically, and as part of a color coding system in a shared metadata spreadsheet. The boxes were then delivered to a suite of student scanners whose process included scanning each sheet, quality checking for errors, corrections if needed and the renaming of files according to an archives-wide numbering system. Each of these steps was tracked both through a color-coding system on the metadata sheet (a different color for each step), and on a Trello board with a card for each collection, moved forward in the process to a new column for each step.

While the students completed their workflow, the PM wrote enhanced descriptions and subject headings for collections at the folder level. These descriptions were designed to forefront the materials’ relevance to African American history and culture, while maintaining their collection level context. The format of the descriptions was formalized across the project, beginning with a collection level sentence description and then a specific description of the folder’s content, for example “This folder contains materials created/collected by Organization X. Organization X was a Civil Rights group founded in 1947. This folder specifically contains promotional materials from Organization X’s 1962 National Conference.” While this format is slightly clunky, it was designed to be effectively contextualized in the Umbra Search aggregation platform, where each folder would be viewed, initially, out of collection level context, while not causing confusion when viewed through other access points like ArchiveSpace, UMedia, and DPLA. If a digitized folder was found through ArchiveSpace, the user would, by design, be aware of the finding aid’s hierarchy. In that context, it would be clear that a folder within the “Organization X records, 1950-1970” was created/collected by Organization X. There would also, most likely, be collection level description telling the history of that Organization. Umbra Search unmoors these folders from their collections, which is beneficial in that it takes aware the prerequisite knowledge of archival organization, but adds a wrinkle in description. A folder titled “Outdoor Activities, 1967-1968” has descriptive context when discovered via the Harlem YMCA Photographs collection finding aid. The title, by itself, gives very little context as to relevancy. The choice to detail collection context in folder level description was designed to give necessary information, without forcing the user to go back to the collection level finding aid.

Subject headings were designed with similar goals. We chose to primarily source from the Library of Congress Subject Headings (LCSH) and Library of Congress Name Authority Files (LCNAF), under the idea that those descriptors’ wide usage would make them most convenient in a cross collections aggregation context. New local subject headings were also created to bring attention to African American individuals and organizations that may previously been difficult to discover. Most of the newly created local subjects were for discrete organizations and individuals that may not have LCNAF or LCSH but were coming up repeatedly within our own collections. This was especially prevalent with collections in the Social Welfare History Archives, where numerous mentions to obscure African American organizations began to form a subject-worthy corpus. Subjects and descriptions were also designed to be able to be harvested from DPLA to Umbra Search, which meant that at least some terms used also had to be found on the list of words Umbra Search uses to pull materials. When necessary, newly relevant terms were added to that list. Due to the nature of LCSHs, the new additions often took the form of what I’ve come to think of as African American Mad Libs- African American + [occupation], African American + [gender], African American + [city, state], etc. While there is a lack of nuance and elegance in this format, the decision was made to move forward with these subjects, as the popularity of LCSH increases the probability of cross-referencing in an aggregation context.

This was particularly, personally, problematic with certain LCSHs like “African American juvenile delinquents.” The preference from the PM perspective was something more like “American American interactions with criminal justice system” or “African American youths — criminal justice system.” The issue with this sort of more culturally respectful and relevant subject heading was, in the context of increasing aggregation and access across systems, in-house subject headings only talk to themselves. If no one else is using the same terminology, when searches are pulled across multiple collections, users won’t get the best search results. This is an issue with the nature of digital archives and special collections aggregation, where different repositories have different ways of upholding descriptive best practices. The goal here, trying to make sure the broadest amount of material was aggregated effectively, led to the occasional use of LCSHs that do not meet the radically culturally relevant possibilities of archival description.

Another reason that African American related materials were previously difficult to discover, is that they are often a single thread throughout a larger collection. Archivists processing a collection often don’t have the time or ability to add folder or item level subjects, which again privileges majority representation. When a processing archivist only has the ability to add a limited number of subjects and must do so on a collection level, it is less feasible to bring hidden marginalized histories to the forefront. For example, if the “United States Colleges’ Homecoming flyers collection” has 500 folders with flyers, and 489 of those flyers are from PWIs (primarily White institutions), it makes little sense to have the LCSH approved “HBCUs (Historically black colleges and universities)” as a collection level subject heading. Without that folder level subject detailing, those 11 HBCUs are lost in the greater body of PWI materials. In the course of this digitization project, the PM had the ability to go through a similar collection of college and university memorabilia and give the appropriate subject heading to each HBCU, as well as a geographic subject, e.g. “African Americans — Georgia.”  This sort of detail work is not financially viable or efficient enough to fit into the workflow of generally already over-taxed processing departments, but in a project funded situation can greatly assist in recognition and accessibility.

Once a collection was completely digitized, enhanced with new description, and its rights assessed, full collections were uploaded to our institutional repository after which a second group of students attached digital objects to ArchiveSpace records, increasing avenues of access to the newly digitized and enhanced materials. Increasing access was the continuing goal of this project, and required taking into account the needs of a variety of stakeholders; most importantly casual researchers, but also the collection curators, and more serious researchers. The project had to take into account the formats of the different access points, as well as the descriptive standards already in place at University of Minnesota. Different records are presented in varying ways across platforms, and this was a constant consideration.

This project has led to increased discussions across the university library system about equitable description and making sure materials are widely discoverable. By setting out a project with this as a clear goal in mind, the door to conversation has been opened. As the project continued and more materials were digitized, curators began to be more actively conscious of relevant materials, stopping by the PM’s office to mention a newly discovered box or folder with African American materials. Personal conversations about particular descriptions (often done before the time of any current employees) also led to increased awareness, or led to a heightened interest amongst the already generally informed staff.

There was also a benefit of being able to do descriptive labor that is out of the scope of regular processing. The results of these enhancing descriptions benefit larger collections that were outside the project, as well as providing a potential guide for future processing work.

Starting with an inventory that didn’t take rights-assessment into account led to a period of clean-up work researching more project additions while digitization was already in progress. If given the opportunity to start the project over again, we would make a greater effort to create a comprehensive list before beginning our workflows. This would mean pulling rights assessment in much earlier in the planning process- our original numbers included 100+ cubic feet of contemporary clippings collections that were immediately excluded. Unfortunately it is very difficult to do this sort of work before receiving funding, and the funding agencies generally would like to see a preliminary list before awarding funding. On a similar note, developing a clear search strategy for discovering relevant records before the project began would have served to make the entire process more efficient. As the PM, I came into the project assuming that preliminary list was a complete list, but after excluding a large amount of material based on rights assessments, there was a period of catch-up before I developed a systematic search strategy with a set list of relevant words, and a tracking sheet that marked each collection remaining to be searched.

Power played a role in this project in a macro sense, rather than on any personal level. Discovering materials related to marginalized peoples is more difficult due to limited time and funding, which leads to a more product, less process fueled arrangement and description style that paints in broad strokes. As African American stories generally exist in the margins of those broad strokes, those details are often left out of the descriptive process. Striking a balance between providing complex description and providing access to the largest amount of materials possibly is a struggle across archives and special collections, and this project provided an opportunity to work outside of those traditional limitations. The very specific aggregation focused context of this project caused some of the inconveniences mentioned earlier, but other metadata enhancing/digitization projects without that focus could do well to expand their descriptive process more freely, and with transparency.

There are many situations in which the perfect stands as the enemy of the good. One concern that was occasionally expressed about this particular project was now certain folders/collections would have extremely detailed description, while other folders/collections would remain at their minimally processed level. Viewing the situation with best intentions, there is generally a dream archive where there is infinite time and labor to make sure each new accession gets a very detailed and high-touch description. Since we do not work in that fantasy field, compromises have to be made. When processing a collection at its highest level, there are usually not the resources to take the time to unearth hidden materials related to marginalized identities, or even to take the step before that of educating oneself on the potential relevant histories. That this digitization project was designed with a, as archivist Kelly Bolding recently phrased it, “descriptive audit” in mind, there was time and funding allotted to not only identify African American related materials, but to do the necessary research to write enhanced descriptions to make those relations evident.

Next Steps

The success of this project provides an example of how dedicating time to increasing discovery for materials related to marginalized people can lead to huge increases in available records, as well as can force staff to forefront those materials in the course of their daily labor. Although not every institution can be awarded a Hidden Collections Grant to fund full-time work on a similar mass-digitization project, this work can be executed on a smaller scale over a longer period of time. The digital landscape provides an opportunity to look at materials outside the context of their collections without having to physically re-order, disrupting concepts of respect des fonds which historically favor people with the power and privilege to securely store and collect materials over time. Respect des fonds is the archival principle that holds that materials should remain as organized by their creating body, be that an individual or an organization, or should remain in the organization received from the donor. There is definite value to this ordering system, especially in collections where there is a clear guiding hand from the creating body- if an organization chose to organize their records by department rather than by date, that provides some information about the organization’s planning processes. It becomes more complicated when materials are, for example, collected by someone with money/power, but represent the heritage and history of a marginalized person/group. Respect des fonds in that case can often mean a collection ends up represented as “Powerful Majority Person’s papers” with a description that says “Powerful Majority Person collected materials related to marginalized people from 1900-1920.” The continues the practice of decentering the people represented by those materials- the collector is at the forefront, and the creators are in the background.

The digital aggregation environment provides an opportunity to circumvent that decentering, without disrupting archival principles in practice. The digitized record exists as both a facsimile of the physical folder as described within its finding aid, but also stands alone as a digital record more easily accessible through keyword and subject searching than through diving into finding aids. Gathering together relevant materials at the folder or item level using subject headings and tags that call attention to hidden histories links those materials into a sort of artificial collection, without truly pulling them from their collection/series/box/folder order.


  1. Search terms included: Negro, Colored, Black, Afro-American, African American, Interracial, Civil Rights, Human Relations, & Intergroup.