Provo— BYU Law, a leading national law school focused on leadership in the legal profession, announced it will hold its sixth annual Law and Corpus Linguistics Conference on February 5. The event will virtually convene prominent legal and linguistics scholars, judges and industry professionals interested in furthering the discipline of corpus linguistics – the methodology for understanding the meaning of words at the time they were written by analyzing language in large collections of texts called “corpora.”

In addition to hosting this annual conference, BYU Law develops pioneering legal research corpora, and fosters influential scholarship and training using this method. BYU Law will release the public version of its law and corpus linguistics platform and search interface the first quarter of 2021. The open public beta (lawcorpus.byu.edu) has already been used by thousands of researchers, including federal and state justices, and appellate attorneys. The platform enables legal professionals to analyze the meaning of words that can be applied to current cases, with several of the corpora having been referenced in legal opinions.

“It’s an exciting and turbulent time in the world of law as courts continue to seek ways to interpret the meaning of words from important historical rulings and founding-era documents,” said D. Gordon Smith, Dean, BYU Law. “I’m inspired to see how far we’ve come in just a decade since we recognized the potential of corpus linguistics to revolutionize the process of interpretation in the legal space. I expect continued advances in the discipline as we grow our community of interdisciplinary scholars and legal professionals familiar with the practice.”

2021 conference

BYU Law’s sixth annual Law and Corpus Linguistics Conference, sponsored by Schaerr Jaffe, LLP, brings together legal scholars from across various areas of scholarship, prominent corpus linguistics scholars, and judges who have employed corpus linguistics analysis in their decisions. The keynote presenter is Tammy Gales, Associate Professor of Linguistics at Hofstra University. She has presented lectures and published numerous articles about using corpus linguistics as a tool in legal interpretation and forensic linguistics.

The event will include three conference sessions on papers or panel topics to address best practices in corpus methods, development of new corpora, triangulation using corpus linguistics and other methods, and interpretation applications (statutes, contracts, canons of interpretation, patents, etc.). For more information about the conference, visit https://corpusconference.byu.edu/2021-home/ or click here to register.

Following the conference, BYU Law will debut the CAPcorpus encompassing Harvard Law School’s American Case Law Access Project (https://case.law/), which includes over 6.7 million cases representing roughly 12 billion words. Covering 360 years of American case law, the corpus will provide avenues for research never before available to law and linguistic researchers. The scope of the data shared by Harvard required the complete re-write of the underlying law and corpus linguistics research platform. Users will be able to search the entire CAPCorpus, or select segments to create more narrowly defined custom corpuses. For replicability, searches and results will continue to be savable and linkable to Google Sheets, which can be shared with others to verify research conclusions and cited without fear of alternation after citation.

Interface public release

In tandem with the conference, BYU Law will demo the formal first public release of its law and corpus linguistics research platform – an unprecedented legal technology tool launched in 2018 that makes available the first large-scale data sets of all U.S. Supreme Court rulings and founding-era documents to provide historical context for the usage and meaning of words for legal use. Each corpus contains millions of words from thousands of texts representing language from a relevant period or court. The platform enables legal professionals to analyze the meaning of words that can be applied to current cases.

Beta tested by thousands of users, the first public release will include a simplified interface and a variety of measures to help researchers utilize the tool. These metrics include distribution and adjusted frequency measures. Distribution metrics measure how distributed a word is in the corpus. They contrast words that are evenly distributed with those that clump in a small number of documents. More than 20 new measures have been added including Gries DP and Gries DP Norm – Gries DP normalized measures dispersion, a feature that allows researchers to see not only how often a word appears but how widespread the word is. This is useful to researchers as words that appear a lot in a small handful of texts are significant in different ways from words that appear a few times in a wide range of texts. Compared to the Gries’ DP the normalized version provides possible minimum and maximum values. 

Corpus linguistics is largely concerned with frequencies, built on the assumption that how frequently (or infrequently) a word appears is meaningful. Frequency information is used in a variety of subdisciplines, from language teachers wanting to prioritize teaching the words their students will most frequently encounter to legal scholars attempting to determine which meanings of a word are most common or ordinary to psycholinguists designing experiments to understand language comprehension. With the central importance of frequency information, it is also critical to consider how words are distributed in the corpus. A word that appears a few times in many different documents is different than a word that appears many times in a single document. A word that has a high frequency but only appears in one document is likely to tell us more about that document than about the corpus as a whole. Corpus researchers who are interested in frequency information in most cases want to assure that their data is not skewed by words with low distribution. The new interface will allow users to select what measures are most appropriate for the research they are conducting.