This case study describes the development of a custom metadata schema to support the description of 65 educational videos used to help teach sign language. The creators of this special collection had specific access and discovery needs that were not served by standard vocabularies, and the custom schema developed methods to describe information like the pace of the interpreter’s fingerspelling, the language being signed, and how space is being used by the interpreter.
By Sarah Sweeney, Digital Scholarship Group, Northeastern University
Problem and Context Statement
The National Interpreter Education Center (NIEC) received grant funding in 2011 to develop a web portal to host the Terp Talks, a series of 65 educational videos designed to help teach sign language for interpreting. The project team wanted to build a portal that was designed with the search and browse behaviors of sign language interpreters in mind, specifically by including search and browse features that allowed interpreters to narrow results using faceted terms familiar to the interpreter learning experience. In order to support this type of faceting, the files needed to be described using metadata that supported the characteristics of signed language materials and interpreters, including the pace of the interpreter’s fingerspelling, the language being signed, and how space is being used by the interpreter. Also important to the project team was the description of the videos as learning objects, including the intended audience, the type of text being translated, and the difficulty of the interpreting.
Vimeo and other low- or no-cost video hosting services could not accommodate the specific needs of the Terp Talk videos, so the NIEC partnered with Northeastern University Libraries (NUL) to host the streaming media for NIEC’s custom portal. NUL had just launched the Digital Repository Service (DRS), a beta version of a Fedora/Samvera repository for storing faculty, staff, and student work. NUL was interested in developing new features for the next version of the DRS that 1) allowed project teams to use the DRS API to display files and metadata stored in the DRS in an external service, like WordPress and 2) create a flexible environment that accommodated the metadata needs of various project teams. NUL used this partnership as an opportunity to develop both of these features, but this case study will primarily focus on the design of a metadata schema to support the descriptive metadata needs of the Terp Talks videos.
Two teams were involved in this work: the project team included two professors from Northeastern University’s American Sign Language Program (Trudy Schafer and Cathy Cogen) and a contracted web developer (Damon Timm). The metadata team included two metadata librarians (Daniel Jergovic and Sarah Sweeney) and the Director of Library Technology Services (Patrick Yott).
Descriptive Analysis
The primary descriptive metadata standard for the DRS is MODS, a bibliocentric standard from Library of Congress that supports the general description of many different types of materials. Although MODS does employ a few elements for the description of audio and video materials, it does not natively support metadata for learning objects or the characteristics of sign language. However, MODS includes an extension element that allows any other metadata schema or standard to be inserted directly into the MODS descriptive metadata record. Our original intention was to find an existing standard to support sign language materials and insert the needed elements and values into the MODS extension field. Unfortunately, our research did not uncover any schema or standards designed specifically to support signed materials. Describing the Terp Talks using metadata that accomodated the sign language materials was crucial to the development of the web portal, so the metadata team agreed to design a metadata structure with controlled term lists and other parameters that fit the needs of this project, otherwise known as a schema.
The design process for creating this schema started with learning from the project team about how they had been describing these videos and how they thought the primary audience of the videos would choose to explore them. During this design period, the metadata staff met with the project team regularly to learn more about the materials and their needs. This involved weekly or twice-monthly meetings to discuss how the materials should be described, determine the primary set of required elements for the schema, and create workflows for cataloging the materials. The discussion about how the materials should be described was lead by the project team and was mostly informed by a set user stories and a paper cataloging form. The user stories described in detail how the project team expected users to interact with the Terp Talk portal and was an invaluable tool for teaching the metadata team about the needs of the portal’s intended users. The paper cataloging form, which was originally used by the project team to catalog and organize the recorded videos, helped the metadata team better understand the characteristics of sign language interpreting and provided a foundation to work from for building the schema.
After consulting other learning object metadata schema, like Learning Object Metadata (LOM), and systems for storing learning objects, like MERLOT, the metadata team started outlining an initial draft for the NIEC metadata schema to support the desired custom descriptive elements, including a series of controlled terms. The primary element set included:
- identifier
- title
- description
- name
- origin
- languageInformation
- contentDescription
- transcript
- series
- comment
Certain characteristics of the interpreters featured in the videos were important to the project team, so the “name” element included a “speakerInformation” field, which allowed the cataloger to select the interpreter’s gender (choosing from: Female, Male, Transgender), age (choosing from: Child, Young Adult, Adult, Ageless (65+)), and race (choosing from: American Indian/Alaskan Native, Asian, Black or African American, Hispanic or Latino, Native Hawaiian/Other Pacific Islander, White, Other).
The language field includes two options for language: signed and spoken. Signed language can include the following descriptive elements, each with their own controlled terms list (with the exception of language):
- language (a string value – text or code)
- signing pace (slow, moderate, fast)
- finger spelling extent (minimal, moderate, extensive)
- finger spelling pace (slow, moderate, fast)
- numbers extent (minimal, moderate, extensive)
- numbers pace (slow, moderate, fast)
- classifiers extent (minimal, moderate, extensive)
- use of space extent (minimal, moderate, extensive)
Similarly, spoken language also has a few descriptive elements with controlled terms lists (with the exception of language):
- language (a string value – text or code)
- speech pace (slow, moderate, fast)
- lends itself to fingerspelling (minimal, moderate, extensive)
- lends itself to classifiers (minimal, moderate, extensive)
- lends itself to numbers (minimal, moderate, extensive)
- lends itself to use of space (minimal, moderate, extensive)
Content description captures several characteristics of sign language interpreting and the video capture being described, including:
- text type (inquiry/dialog, narrative, expository, persuasive)
- register (frozen, formal, consultative, informal, intimate)
- captions (open captions, closed captions, none)
- conversation type (dialogic, monologic, interactive)
- audience (student, working interpreter, educator)
- duration (unit of time validated with pattern matching)
- overview (free text)
The NIEC schema also contains some general information in addition to the sign language-specific description, including publication and date information, a description, and the video duration. The MODS standard already includes basic descriptive values, like title and creator, but we chose to include similar elements in the NIEC schema to ensure it could stand alone in another system.
We trusted the NIEC team as the experts they were and relied on them to define the controlled values and to help us set the rules for what elements were required, what elements needed minimum or maximum values, and how elements would be nested. For example, the catalogers’ instinct was to describe spoken and signed languages using the same elements and vocabularies, but the project team explained that the characteristics of spoken language (how it is being interpreted by sign language) were distinct from characteristics of signed language (how it is interpreting the spoken language). Using this input, the spoken and signed languages were given their own elements nested in a “languageInformation” element.
Outcome Analysis
This work led to the successful development of a web portal that uses the DRS API to retrieve the Terp Talk videos and present them to the user in the NIEC’s custom WordPress interface. As originally designed, the site allowed interpreters to explore the terp Talk materials using the language and keyword terms used by the interpreter community to describe their own work.
Northeastern University Libraries has made the NIEC schema available in GitHub, but because the design was based on the specific needs of the NIEC project, we are not sure how useful it is for other similar projects. If there is interest, we may consider redesigning the schema to adapt it to the broader needs of the interpreter community by adding additional elements and values to the schema and adding support for other materials. We are also not sure how useful the facets or metadata were to the users of the site, as we are not aware of any usability studies or feedback the project received after the portal was launched.
This project was very localized and the schema was designed based on the workflow and needs of one particular group, therefore it is likely that the schema and its controlled values are not broadly representative of the needs of all signed language learning objects. This is one of the reasons no outreach was performed after the project was completed to announce the schema or share it with other projects that might be interested in adopting or adapting it for future use. Further review of how users search or browse for sign language materials would be needed to augment the schema to fit a larger audience. Ideally, given more project and staff time, we would have conducted a usability study to determine how users search and browse for sign language materials.
The controlled values created for the schema also pose an issue (particularly the gender, age, and race elements,) because the selection of these values was directly influenced by the experience and knowledge of the small group of people who designed schema. The fixed lists likely do not capture all of the possible values for each element (particularly gender and race) and we continue to be uncertain about the usefulness of the four age categories. More attention should have been paid to these values during the design phase and any further edits to the schema should focus first on the usefulness of these controlled values.
Unfortunately, the NIEC project is no longer active and some of the WordPress website features, including the initial set of custom metadata facets, are no longer available as of 2017 (the videos and their metadata are still securely available in the DRS). The fact that some parts of the project portal are unusable just seven years after the project was initiated demonstrates the need to invest in long term preservation activities in addition to web publishing activities. To help project teams implement both the presentation and preservation aspects of their projects, Northeastern University Libraries has since developed a set of WordPress plugins to publish DRS materials to library-managed, sustainable WordPress sites for projects similar to NIEC to customize on their own. The project materials are stored securely in the DRS and NUL takes ownership of the WordPress site to make sure it remains usable (read more about CERES here: https://cerestoolkit.dsg.northeastern.edu/).
Next Steps and Research Agenda
The NIEC was one of six federally funded interpreter education centers under the umbrella of the National Consortium of Interpreter Education Centers whose funding ran out in 2016. The project is no longer active or supported and no further work is planned. NUL still stores and maintains the NIEC files and metadata, but no other projects have used the NIEC schema to describe their materials. If the NIEC project is revived or if the NUL has the opportunity to support other videos with sign language, the next steps in the project would be:
- Find additional partners: A lack of awareness of the broader needs of sign language materials and their users is a large blindspot for this project. Any additional work to the schema would benefit from the perspective of more voices from the sign language community, including users and creators of sign language materials.
- Perform a usability study: As far as we are aware, no effort was made to solicit feedback from users about the metadata, facets, or the experience of searching or browsing these materials in the WordPress interface. Before the schema can be revised, its usefulness should be evaluated by performing a usability study on a revived or recreated version of the NIEC search and browse interface, paying particular attention to what terms and characteristics users expect to use when browsing a collection of sign language materials.
- Evaluate and revise the schema: The feedback gathered from the usability study should be used to evaluate and revise the existing schema: What elements or vocabulary values are not useful? Are some elements too strict or too flexible? What additional elements of description should be added to the schema? What should be removed?
- Document the schema: Currently, no formal documentation exists to describe the schema or how it can be used to describe sign language materials. Once the schema has been revised, it should be documented to the best of our ability (gathering as much from the original project team as possible) and shared widely.