Since the earliest days of eDiscovery, professionals have used litigation software to search for specific words within documents and return individual documents of potential relevance to them for closer inspection. We’re now at a point where those tools and the people who use them need to move beyond searching for certain words – and instead search for meaning, according to leading data science experts.
To dive deeper into this new era in litigation data science, LexisNexis sponsored a webinar hosted by EDRM, a guidelines and standards organization that creates practical resources to improve eDiscovery and information governance.
The guest presenter was Thomas Barnett, special counsel for eDiscovery and data science at Paul Hastings LLP. In addition to being an experienced litigator himself, Mr. Barnett is considered one of the world’s leading experts in the areas of advanced data analytics, predictive coding and technology-assisted review.
Mr. Barnett noted that most of the eDiscovery technology currently available in the legal industry relies primarily on the text of the documents. This is typically done by either matching combinations of words with traditional keyword search, or by searching across all of the words in a document as with predictive coding search.
However, “any good lawyer seeking information about a case considers far more than just the particular words that were used in a communication,” said Barnett. “The background, foundation and context for the information is essential. This is just a basic rule of how we communicate with the people around us every single day, but it is not how the vast majority of current eDiscovery technology works.”
Mr. Barnett explained how – in addition to the text within a document – other information can be analyzed to better understand and derive knowledge from the data. The primary example of this for purposes of eDiscovery to date has been Metadata.
This the information that accompanies documents but that is not part of the actual document itself – such as the date and time the document was created, when and how it was modified, the time an email was sent, when it was opened, for example.
The reason we collect and analyze evidence is to help us piece together a story that we can tell to a judge, jury or investigator, based on the facts in front of everyone in the proceeding.
The new frontier is now Entity Data, which is information contained in the text of documents that can be identified and classified into pre-defined categories. So instead of looking at the text of a document solely as a string of characters, this approach incorporates knowledge about the various elements of the document and asks: “What do the words in the document actually mean?”
Litigation 101 White Paper
Case Organization, Analysis & Presentation in the Age of eDiscovery
Gaining greater insight into case data to help drive better decisions around
Other “layers” of structured data that Mr. Barnett anticipates will be of increasing importance as we move beyond words and toward meaning in eDiscovery include:
- Semantic Layer – this is a level of analysis that maps complex data into familiar business terms, such as “Person” or “Company” or other broad semantic terms, to simplify the complexity of business data;
- Event Layer – this is an advanced data analysis that helps provide contextual understanding of how certain data is related to other pieces of data within a document; and
- Sentiment Layer – this refers to the emotional nature of various words within a document and uses technology determine whether the opinion being expressed in writing is positive, negative or neutral.
“These new areas of advanced data analytics are more than academic exercises for scientists,” said Mr. Barnett. “They help us reconnect at a very basic level with why we do eDiscovery in the first place. The reason we collect and analyze evidence is to help us piece together a story that we can tell to a judge, jury or investigator, based on the facts in front of everyone in the proceeding. Once we get past searching for words in documents and are more effectively searching for meaning in those documents, we’ll be better able to surface the right data we need as litigators to help us tell our story.”
If you enjoyed this post, you might also like:
Can you Answer these 10 Basic eDiscovery Questions?