1, 2, 3. Testing, Testing: Better Keyword Search

Most Helpful Article

I give about 50-70 educational presentations each year, so I do a fair number of sound checks.  “Testing. one, two, three.  Testing, testing.”  Scintillating stuff, and hopefully not the highlight of the show.

But “testing, testing” may indeed be the most important point I make, because “testing, testing” should be the mantra of all who use keyword search in e-discovery.  Few actions deliver as much bang for the buck as simple testing of search terms, or do more to forestall boneheaded mistakes.

The tip I share today is one that will cost you little and but could save your client or company a lot of time, money and grief.  It’s a capability lawyers can and should have at the ready, on their very own desktops.

Tip: Run proposed searches against an index of terms found on computers that don’t contain responsive information. 

Huh?!?  That DON’T hold responsive ESI?!?

Yup.  On my desk, I have an index of all the terms that appear in a pristine copy of Microsoft Windows and Office–just as they come in a fresh install, before anyone stores a scintilla of evidence on the machine.   Then, when a litigant proposes a flaky search term, I run the proposed term through my “it can’t possibly have any evidence on it” index to see how many hits it generates.  You’d be surprised how many times these noise hits number in the hundreds or even thousands of instances.  De-NISTing helps, but it doesn’t get rid of all the benign stuff that throws off noise hits.

That’s a powerful and persuasive metric to use to push back against ill-advised queries.

It’s not hard to have this capability on your desktop.  All you need is an industrial strength text parser/indexer and (briefly) a disk with a fresh install of Windows and Office, or whatever complement of operating system(s) and application software you prefer.  The only thing that must not be on the disk is any data other than what comes in a fresh install.

To create the index, I suggest you use a tool whose parsing and indexing capabilities are most like the tools used by the leading e-discovery vendors.  Here, my dirt cheap suggestions would be Proof Finder ($100.00 to charity at www.prooffinder.com) or DTSearch ($199.99 at www.dtsearch.com).  I’m partial to Proof Finder, not only because it’s absurdly less expensive than anything else out there that can do what it can do, but also because all proceeds from its sale benefit the literacy charity Room to Read.  As the whiz kid brother of Nuix, the powerful e-discovery and information governance product, Proof Finder is far more industrial strength than its bargain basement pricing allows.  It’s lightning fast, and unbelievably capable, customizable and easy-to-use.  All that for only $100 to a splendid cause.  And I guarantee you that it will pay for itself many times over, probably the first time you use it.

Once you have either tool (or the tool of your choice) installed, point it at the disk holding nothing but Windows and Office and let the tool chew through the contents to create the index.  Once indexed, you can dispense with the pristine disk and rely on just the index.  Now, when someone floats a keyword search proposal, you can instantly check for noise hits that will be returned but that can’t possibly be evidence.  Simple. Cheap. Powerful.  Persuasive.  Plus, you can quickly explore and test tweaks to the search (like Boolean constructs) that will deep-six the noise hits without eliminating the search altogether.

And, no, you’re not making yourself a witness in the case.  If that worries you too much, run the test on your own machine and, if you develop something you need to take to court, have your vendor or the person who will serve as your witness run the test.  Your testing can typically be protected from disclosure as work product, if your witness hasn’t relied upon them in preparing to testify.  In my experience, you won’t need an expert witness, because the results are so objective and easy-to-replicate, there’s no basis for the other side to fairly challenge them.

Once you’re comfortable running tests against non-evidence, you’ll quickly become comfortable running ad hoc tests on samples of potentially responsive data from key custodians.  One of the first things I like to do early in a case is secure copies of key custodians’ e-mail collections, then process and index these so that I can use them to test searches.  You don’t need a lot of data from a few well-chosen custodians to flush out really bad queries.  Moreover, having a key custodial mail index at hand is a fantastic early case assessment technique.

So, make “testing, testing” your mantra for keyword search.  It’s really as easy as 1-2-3.

This was originally posted by Craig Ball on July 9, 2012 at https://ballinyourcourt.wordpress.com/2012/07/09/1-2-3-testing-testing-is-that-better/#more-730

Tagged:

Rate This Article

(39 out of 82 people found this article helpful)

Leave A Comment?