Trial Magazine
Theme Article
The Big Data Frontier
From developing themes to understanding juror biases, “big data” analytics can help position your client’s case for a strong start at trial.
August 2024Significant advances in technology in recent history are resulting in vast amounts of information being generated and stored every second from a wide range of sources such as social media, sensors, mobile devices, and the general Internet of Things. This proliferating information is known as “big data.” Big data can be defined as “extremely large and complex datasets that cannot be easily managed or analyzed using traditional data processing methods.”1 “It typically involves the collection, storage, and analysis of massive volumes of information to uncover valuable insights and patterns.”2
Although the term “big data” has only come into use in the last three decades for describing these datasets, its collection began in the 1960s and 1970s with market research firms focusing on consumers’ habits.3 For decades, these firms and political organizations have been surveying voters, collecting data, and then interpreting that data to glean voter insights.4 This data has now become so detailed that political consultants can predict how people are going to vote with high accuracy and can microtarget likely and persuadable voters via online ads, phone calls, or even in-person visits.5
Today, a huge variety of entities amass information using data mining (analyzing new or existing large databases) and web scraping (extracting information from the internet)6 for a range of purposes, from improving products and services to monitoring for fraud and compliance.7 It is used by private companies and government offices in areas such as banking, health care, education, transportation, and logistics, just to name a few.8
Big data can be incorporated in plaintiffs’ cases in several ways. It can be used to make cost and outcome predictions based on past cases9; analyze court records and dockets10; and detect patterns in the behavior of judges, opposing counsel, and experts.11
Analytics can be used to extract critical insights from data, leveraging advanced technologies such as natural language processing, machine learning, and large language models (LLMs).12 They are used to analyze and query discovery materials in one case or across multiple cases. This is particularly helpful in class actions and mass tort litigation with thousands of people bringing cases against the same entity. Depositions, trial transcripts, and other similar materials are excellent sources for big data analytics.
You can apply this technology to trials as well. Use big data analytics in the pretrial research phase, while developing questions for potential jurors, and in the process of juror “deselection” during voir dire.
Let’s delve into some specific ways plaintiff attorneys can use big data.
Big Data and Focus Groups
Focus groups often provide insights on juror decision-making that influence approaches to case strategy and yield concepts and language that can help you develop persuasive trial themes. Big data analytics can improve the process of identifying focus group participants’ influences, insightful comments, and important concepts related to a case. New products and add-ons to existing software can extract language and themes that participants use when discussing a case. Software with these capabilities (currently or coming soon) include Zoom AI Companion, Dropbox AI, and Microsoft Copilot for Microsoft products, among others.
Vendors focused on the legal industry, such as LexisNexis and Westlaw, are also adding tools that leverage their voluminous legal repositories and AI with features like summarization and contextual searching to provide insights on collected data. While vendors may not advertise these features specifically for focus groups, they can be tremendous time savers.
Big data can be used to gather insights about participants’ beliefs, values, and decision-making processes.
You can use big data to investigate focus group participants to understand the attitudes that may affect a jury’s decision-making process. The data can show respondent voting behavior, work history, or social attitudes that often translate to decision-making. Big data can also be used to gather insights about participants’ beliefs, values, and decision-making processes across multiple focus group research projects. Many companies provide juror research services using information from public sources and social media to build juror profiles. You can use these same services for participants in pretrial focus groups.
Big Data and Surveys
Surveys involve much larger groups of respondents than are feasible for in-person focus groups and, as a result, provide more statistically reliable data. Attorneys frequently use traditional surveys to support change-in-venue motions, but these surveys also help support, confirm (or disconfirm), and improve focus group, online, and database research. They generally include attitudinal questions, including those related to the respondents’ opinions of lawsuits and lawyers, in addition to their opinions of the parties and specific case issues.
The questions may be asked during in-person interviews, by telephone, or via the internet. The calls or internet invitations are carefully curated so only respondents selected randomly or by quota may respond. The survey results are then entered into a database or other construct and analyzed using statistical programs.
In recent years, online “case surveys” that combine traditional focus group-like features with the greater statistical reliability of surveys have become more common. These case surveys do not usually include group interactions. Instead, respondents are asked to review written case descriptions along with images and videos to supplement the description. The descriptions include pseudonyms and other replacements for non-critical factors (such as changing the location when it is not a factor).
Respondents are asked how different sets of facts would affect their verdicts and damages, and why. By analyzing the trends found in large amounts of data from case surveys, consultants and trial lawyers can refine case presentations. Attorneys can often use common programs such as Excel to sort and filter information from surveys on their own to glean useful information.
Big Data in Jury Selection
Big data on prospective jurors can include neighborhood demographics, views about social or political issues, and personal interests. When there is a relationship between these factors, it can be critical information to better understand. For example, you might learn that people from certain socioeconomic levels tend to find more often for the plaintiff or the defendant.
It is important to know that using some characteristics in jury selection could violate Batson v. Kentucky, the U.S. Supreme Court decision that prohibits exercising peremptory challenges on jurors based on their race or ethnicity and sex.13 Be aware of Batson progeny and court rules in your jurisdiction, which may modify Batson procedures. You must avoid using prohibited characteristics as a basis for striking jurors, and you need to be on the alert for improper strikes by the defense.14
Revealing unexpected correlations and juror attitudes. Based on our experience, big data research has shown that certain attitudes about lawsuits, lawyers, corporations, and other matters correlate with anti-plaintiff sentiment. Analysis of this data often reveals top-level correlations between verdicts or damages and attitudes or beliefs that seem to be unrelated to the case at hand.
For example, potential jurors’ membership in a labor union or approval of labor unions could be relevant to the verdict in a case that does not involve labor unions. Military service could affect opinions in a case that has no apparent relationship to military service. Views about conforming to the letter of the law can predict preferences for one litigant over another.
These correlations gleaned from big data don’t just help to identify relevant juror characteristics; they can also shed light on critical topics to address during jury selection to uncover these characteristics about prospective jurors.
Compiling individual juror profiles. The individual juror profile is a more recent development. This is information specific to each prospective juror, not simply a list of characteristics of the least and most favorable jurors. This information is available because big data is collecting more information on individuals, more companies are selling data to one another to create richer profiles on people, and websites and apps that access huge databases have made it possible to access this data online in seconds.
Several companies specialize in creating juror profiles by combining information from a number of these online resources. Use your favorite search engine with terms such as “juror profiles” and “juror analytics” to find these providers.
It is imperative that the creation of juror profiles does not run afoul of state professional rules of conduct15 and ABA Formal Opinion 46616 by providing more than a passive view of the juror’s online profile. Most of these profiling companies use private interfaces that conform to the requirements of ethics rules and the guidelines established in Opinion 466 to retrieve this information. Ask potential profile providers about their guidelines and policies to ensure they are compliant.
Information on individual jurors that you can obtain from big data sources includes: occupation; socioeconomic status; data drawn from public records, such as criminal or bankruptcy records; political leanings and party affiliation; and information drawn from social media posts.
This information can allow you to
- discover hidden risks, biases, and potential leaders on the jury panel
- combine a wealth of public and social data with AI insights and analysis to discover information jurors are not likely to disclose
- gain deeper insights into jurors and experts, refine courtroom strategy, and make better decisions regarding the jury
- obtain key insights into a juror’s personality, opinions, and worldview.
The greatest advantage of big data is accessing databases with massive amounts of consumer or political data. Juror profiles developed from political databases are especially helpful. These are typically proprietary, so, ordinarily, you must work with the vendor or litigation data firm to access the information.
If an external vendor is not an option, articles about how demographics or attitudes can affect juror behavior provide guidance, with recommendations on how to find this information online. Minimally, a social media search using as much information as is known about a particular juror can provide insights, but resolving name collisions with common names can be a tedious process with a large venire.
Advances in AI and the advent of LLMs allow lay users to analyze vast amounts of data.
Big Data and State-of-the-Art Technologies
The task of working with big data is an untenable process for the average user. However, advances in AI and the advent of LLMs, commonly accessed through generative pretrained transformers (GPTs), allow lay users to analyze vast amounts of data. These models use natural language through prompts to make sense of the data collected from focus groups, surveys, juror research, and more.
The barrier to entry for leveraging the powerful features available with LLMs is exponentially lower than with traditional analysis techniques. Most people have heard the terms “generative AI” or “chatbot,” and these chatbots are front ends to LLMs that allow a user to ask a question in natural language and receive a natural language response that reads as easily as the article you are reading now. In other words, a user can present a chatbot with data and a task and receive actionable information. In this context, the big data component is the LLM and its vast training base that allows it to interpret our much smaller datasets.
LLMs also have the potential to dramatically simplify the work necessary to use unstructured data as opposed to the more common structured format of information. Structured data exists in a well-known, table-friendly format (spreadsheets) while unstructured data is more abstract. For instance, an address can be in a structured data format that has predefined fields such as name, street number, and street name. Unstructured data could be the response to: “Describe your home’s location.” That open-ended wording might generate replies ranging from very specific GPS coordinates to something as generic as “on a heavily wooded cul-de-sac overlooking a lake.”
The lack of a forced format results in the response being considered unstructured, and this is especially important for attorneys analyzing big data since the information received in the course of litigation will contain more unstructured data than structured data.
The real work for attorneys as users of chatbots is in crafting good prompts to extract the necessary information without compromising veracity or value.17 Use the search term “prompt engineering” to find extensive guides that cover the topic. One of the newer prompting techniques involves asking the chatbot to assume a persona. This technique can improve the chatbot’s responses as it acts as a hint of sorts to further guardrail the chatbot to responses applicable to the persona.
For example, the prompt “please review the attached documents and recommend potential themes for the case” might generate theme ideas that would undermine your client’s case. The prompt “act as a plaintiff personal injury attorney who practices in Texas, and please review the attached documents and recommend potential themes for the case” should elicit more appropriate suggestions for a plaintiff attorney.
Once you begin to master the art of prompting, the amount of data you can analyze increases dramatically. As a result, the number of participants in focus groups or the length of a survey can increase and gather more information—for example, a prompt created for 20 participants will likely work unmodified for 2,000 participants.
Chatbots can also generate questions that you can use for focus groups, surveys, and jury selection. Unlike using AI to create documents such as briefs or motions that could contain misleading or inaccurate information,18 AI chatbots are well-suited to formulating questions for collecting potential juror information. While you may struggle to come up with dozens of relevant questions, a chatbot can provide hundreds of potential questions in mere seconds. Not all will be appropriate for a specific case, but they provide a starting point and keep you from having to search furiously through a personal trove of potential questions.
Many chatbots have been trained on data collected from the internet that includes discussions on topics relevant to upcoming trials. After the verdict in a high-profile or significant case, it is not uncommon for the public to make its thoughts known on social media. The questions that people pose online to their followers or the commentary they post on social media detailing their views about why a case was decided the way it was can serve as source material (referred to as “training data”) for AI chatbots to utilize when generating questions for potential jurors.
For example, the Meta Llama LLM19 is trained extensively with content from its various properties such as Facebook and Instagram, which allows derivative tools created based on Meta’s LLM to generate content using information from user posts.20
New LLMs (or updates) are being published all the time, and it’s important to understand what data they were trained with and any specific customizations. In addition to chatbots based on Llama, other LLMs will likely to be useful to trial lawyers. Claude,21 by Anthropic, is well-known for its ability to handle scientific and technical data. Gemini,22 by Google, is adept at working with non-text data like images and audio. Mistral,23 by Mistral AI, is specialized to follow instructions and is currently considered one of the fastest to generate its first output token after prompting.24
GPT-4,25 by OpenAI, is the most popular general-purpose LLM26 and has a rich community developing tools based on its chatbot, ChatGPT. The chatbot may have a different name than the LLM, such as GPT-4 and ChatGPT, so it is important to look at the model being used by the chatbot.
AI technology has the ability to be a significant instrument of change when it comes to repositories of juror information. However, the introduction of AI into our workflows does not make AI the hammer and every problem a nail. There are articles that discuss, in great length, the various pitfalls that dot the AI landscape in its current state, such as hallucinations,27 violations of confidentiality and privacy regulations,28 and inadvertently biased results.29 The limitations of this technology are something every trial lawyer must keep in mind.
The use of big data—through traditional techniques and AI—has brought juror research into a new era of insight and predictive capabilities. It is a powerful tool that can significantly enhance the chances of plaintiffs prevailing at trial and holding wrongdoers accountable.
Richard Jenson is president of Jenson Research and Communications in Austin, Texas. Jarod Jenson is CEO of Timbre Solutions, Inc., in Dallas-Fort Worth, Texas. Jill Holmquist is president of Forensic Anthropology, Inc., in Lincoln, Neb. They can be reached at rajenson@aol.com, jarod@timbre.solutions, and jill@fai-insight.com, respectively. The views expressed in this article are the authors’ and do not constitute an endorsement of any product or service by AAJ or Trial.
Notes
- Danny Tobey et al., The Rise of Big Data: Legal Challenges Raised by Artificial Intelligence and Other Data Science Trends, Life Sciences Summit, Oct. 4, 2023, 2, https://tinyurl.com/wkpc95v4.
- Id.
- Sherry Tiao, What Is Big Data?, Oracle, Mar. 11, 2024, https://www.oracle.com/big-data/what-is-big-data/.
- Id.
- Microtargeting means “to direct tailored advertisements, political messages, etc., at (people) based on detailed information about them (such as what they buy, watch, or respond to on a website).” Microtarget, Merriam-Webster, https://www.merriam-webster.com/dictionary/microtarget; Case Study: Profiling and Elections—How Political Campaigns Know Our Deepest Secrets, Privacy Int’l, Aug. 30, 2017, https://privacyinternational.org/case-study/763/case-study-profiling-and-elections-how-political-campaigns-know-our-deepest-secrets.
- Understanding Big Data Collection, Inst. of Data, Aug. 28, 2023, https://www.institutedata.com/us/blog/understanding-big-data-collection/. (Note that web scraping raises legal and ethical issues and includes compliance with websites’ terms of service and regulatory protections for individual users.)
- Tiao, supra note 3.
- How Big Data Is Transforming Industries in Big Ways, 3 Pillar Global, July 7, 2024, https://www.3pillarglobal.com/insights/how-big-data-is-transforming-industries-in-big-ways/.
- Nan L. Grube, Data Analytics and Artificial Intelligence in Litigation, 78(1) The Mo. Bar (Feb. 8, 2022), https://news.mobar.org/data-analytics-and-artificial-intelligence-in-litigation/; 5 Ways Big Data Is Being Used in the Legal Profession, Analytics Insight, July 11, 2022, https://www.analyticsinsight.net/5-ways-big-data-is-being-used-in-the-legal-profession/.
- Litigation Analytics: The Types of Data You Need in Court, LexisNexis Insights, May 17, 2023, https://www.lexisnexis.com/community/insights/legal/b/thought-leadership/posts/taking-analytics-to-court.
- Id.
- For more on artificial intelligence coverage in Trial in general, see the June 2024 issue.
- Batson v. Kentucky, 476 U.S. 79 (1986).
- We note that Arizona has abolished peremptory challenges altogether. Ariz. R. Crim. P. 18.4 and 18.5; Ariz. R. Civ. P. 47(e).
- Various bar associations have also provided opinions about the ethics of juror research. See, e.g., D.C. Bar Ethics Op. 371 (2016), https://www.dcbar.org/for-lawyers/legal-ethics/ethics-opinions-210-present/ethics-opinion-371; N.Y. City Bar Ass’n Formal Op. 2012-2: Jury Research and Social Media (2012), https://www.nycbar.org/reports/formal-opinion-2012-2-jury-research-and-social-media/.
- ABA Formal Op. 466 (2014).
- For more on LLMs, see Alex Freeburg & Erik Dahl, Large Language Model Fundamentals, Trial, Mar. 2024, at 46.
- John Russell, Sanctions Ordered for Lawyers Who Relied on ChatGPT Artificial Intelligence to Prepare Court Brief, Courthouse News Serv., June 22, 2023, https://www.courthousenews.com/sanctions-ordered-for-lawyers-who-relied-on-chatgpt-artificial-intelligence-to-prepare-court-brief/.
- Meta Llama 3, https://llama.meta.com/llama3/.
- Katie Paul, Meta’s New AI Assistant Trained on Public Facebook and Instagram Posts, Reuters, Sept. 28, 2023, https://www.reuters.com/technology/metas-new-ai-chatbot-trained-public-facebook-instagram-posts-2023-09-28/.
- Anthropic, Claude, https://claude.ai/.
- Google, Gemini, https://gemini.google.com/app.
- Mistral AI, https://mistral.ai.
- Justin Zhao et al., LoRA Land: 310 Fine-tuned LLMs That Rival GPT-4, A Technical Report, arXiv, Apr. 29, 2024, https://arxiv.org/pdf/2405.00732.
- OpenAI, ChatGPT, https://openai.com/index/gpt-4/.
- Generative AI Top 150: The World’s Most Used AI Tools (Feb. 2024), FlexOS, https://www.flexos.work/learn/generative-ai-top-150.
- Matthew Dahl et al., Hallucinating Law: Legal Mistakes With Large Language Models Are Pervasive, Stanford Univ., Human-Centered Artificial Intelligence, Jan. 11, 2024, https://hai.stanford.edu/news/hallucinating-law-legal-mistakes-large-language-models-are-pervasive.
- Joanne Byron, Part 4: AI and HIPAA Privacy Concerns, Am. Inst. of Healthcare Compliance, https://aihc-assn.org/ai-and-hipaa-privacy-concerns/.
- Siladitya Ray, Google CEO Says Gemini AI’s ‘Unacceptable’ Responses Offended Users and Showed Bias, Forbes, Feb. 28, 2024, https://www.forbes.com/sites/siladityaray/2024/02/28/google-ceo-says-gemini-ais-unacceptable-responses-offended-users-and-showed-bias/?sh=f448e8811032.