4
How the team determines that training is complete varies depending upon the
software, the number of documents reviewed, and the results targeted to be achieved
after a cost benefit analysis. Under the training process in software commonly
marketed as TAR 1.0,
7
the software is trained based upon a review and coding of a
subset of relevant and nonrelevant documents, with a resulting predictive model that
is applied to all nonreviewed documents. Here, the goal is not to have humans review
all predicted relevant documents during the TAR process, but instead to review a
smaller proportion of the document set that is most likely to help the software be
reasonably accurate in predicting relevancy on the entire TAR set. The software
selects training documents either randomly or actively (i.e., it selects the documents
it is uncertain about for relevancy that it “thinks” will help it learn the fastest),
resulting in the predictive model being updated after each round of training. The
training continues until the predictive model is reasonably accurate in identifying
relevant and nonrelevant documents. At this point, all documents have relevancy
rankings, and a “cut-off” point is identified in the TAR set, with documents ranked at
or above the cut-off point identified as the predicted relevant set, and documents
below the cut-off point as the nonrelevant set.
In many TAR 1.0 processes, the decision whether the predictive model is
reasonably accurate is often measured based on the use of a control set, which is a
random sample taken from the entire TAR set, typically at the beginning of training,
and is designed to be representative of the entire TAR set. The control set is reviewed
for relevancy by a human reviewer and, as training progresses, the computer’s
classifications of relevance of the control set documents are compared against the
human reviewer’s classifications. When training no longer substantially improves
the computer’s classifications of the control set documents, training is viewed as
having reached completion. At that point, the predictive model’s relevancy decisions
are applied to the unreviewed documents in the TAR set. Under TAR 1.0, the
parameters of a search can be set to target a particular recall rate. It is important to
note, however, that this rate will be achieved regardless of whether the system is well
trained. If the system is undertrained, an unnecessarily large number of nonrelevant
documents will be reviewed to reach the desired recall, but it will be reached. Ceasing
training at the optimal point is not an issue of defensibility (achieving high recall),
but rather a matter of reasonableness, minimizing cost of reviewing many extra
nonrelevant documents included in the predictive relevant set.
8
7
It is important to note that the terms TAR 1.0 and 2.0 can be seen as marketing terms with various
meanings. They may not truly reflect the particular processes used by the software, and many
software use different processes. Rather than relying on the term to understand a particular TAR
workflow, it is more useful and efficient to understand the underlying processes, and in particular,
how training documents are selected, and how training completion is determined.
8
In many TAR 1.0 workflows, this point of reaching optimal results has been known as reaching
“stability.” It is a measurement that reflects whether the software was undertrained at a given point
during the training process. The term “stability” has multiple meanings. The term “optimum results”
is used throughout to eliminate potential confusion.