Levels of Evidence

Scientific research is quite different from legal research. Law is outcome-centric. In other words, we have a desired result and we identify facts and theories that support our desired outcome. Conversely, science is research-centric. Scientists begin with hypotheses, but permit the research to dictate the conclusion.  It’s similar to deciding which is more important: where you’re going or how you get there? Lawyers are concerned with the destination, scientists with the journey. For a true scientific researcher, the goal is to confirm your hypothesis, but it isn’t essential.

At the crossroads is forensics, which is the specialty of the expert witness. Each side tries to identify an expert who can demonstrate that his conclusion is scientifically correct (the plaintiff), justifiable, (often the civil defendant), or plausible (the criminal defendant).

Not all scientific research is created equal. Notwithstanding the fact that most jurisdictions are governed or at least influenced by the Daubert standard (DAUBERT v. MERRELL DOW PHARMACEUTICALS, INC., 509 U.S. 579 [1993]), experts have their own reliability tests. They are frequently referred to as “levels of evidence.”

A good example is the “Levels of Evidence 2” chart provided by the University of Oxford, Centre for Evidence Based Medicine.

The CEBM ‘Levels of Evidence 1’ document sets out one approach to systematising this process for different question types.

LevelTherapy / Prevention, Aetiology / HarmPrognosisDiagnosisDifferential diagnosis / symptom prevalence studyEconomic and decision analyses
1aSR (with homogeneity*) of RCTsSR (with homogeneity*) of inception cohort studies; CDR”  validated in different populationsSR (with homogeneity*) of Level 1 diagnostic studies; CDR”  with 1b studies from different clinical centresSR (with homogeneity*) of prospective cohort studiesSR (with homogeneity*) of Level 1 economic studies
1bIndividual RCT (with narrow Confidence Interval”¡)Individual inception cohort study with > 80% follow-up; CDR”  validated in a single populationValidating** cohort study with good” ” ”  reference standards; or CDR”  tested within one clinical centreProspective cohort study with good follow-up****Analysis based on clinically sensible costs or alternatives; systematic review(s) of the evidence; and including multi-way sensitivity analyses
1cAll or none§All or none case-seriesAbsolute SpPins and SnNouts” “All or none case-seriesAbsolute better-value or worse-value analyses ” ” ” “
2aSR (with homogeneity*) of cohort studiesSR (with homogeneity*) of either retrospective cohort studies or untreated control groups in RCTsSR (with homogeneity*) of Level >2 diagnostic studiesSR (with homogeneity*) of 2b and better studiesSR (with homogeneity*) of Level >2 economic studies
2bIndividual cohort study (including low quality RCT; e.g., <80% follow-up)Retrospective cohort study or follow-up of untreated control patients in an RCT; Derivation of CDR”  or validated on split-sample§§§ onlyExploratory** cohort study with good” ” ”  reference standards; CDR”  after derivation, or validated only on split-sample§§§ or databasesRetrospective cohort study, or poor follow-upAnalysis based on clinically sensible costs or alternatives; limited review(s) of the evidence, or single studies; and including multi-way sensitivity analyses
2c“Outcomes” Research; Ecological studies“Outcomes” ResearchEcological studiesAudit or outcomes research
3aSR (with homogeneity*) of case-control studiesSR (with homogeneity*) of 3b and better studiesSR (with homogeneity*) of 3b and better studiesSR (with homogeneity*) of 3b and better studies
3bIndividual Case-Control StudyNon-consecutive study; or without consistently applied reference standardsNon-consecutive cohort study, or very limited populationAnalysis based on limited alternatives or costs, poor quality estimates of data, but including sensitivity analyses incorporating clinically sensible variations.
4Case-series (and poor quality cohort and case-control studies§§)Case-series (and poor quality prognostic cohort studies***)Case-control study, poor or non-independent reference standardCase-series or superseded reference standardsAnalysis with no sensitivity analysis
5Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”Expert opinion without explicit critical appraisal, or based on physiology, bench research or “first principles”Expert opinion without explicit critical appraisal, or based on economic theory or “first principles”
Produced by Bob Phillips, Chris Ball, Dave Sackett, Doug Badenoch, Sharon Straus, Brian Haynes, Martin Dawes since November 1998. Updated by Jeremy Howick March 2009.

There are five different types of analysis in this chart, each with its own “levels of evidence” hierarchy. Those five types of analysis are: 1) Therapy/Prevention, Aetiology/Harm; 2) Prognosis; 3) Diagnosis; 4) Differential diagnosis/symptom prevalence study; and 5) Economic and decision analyses.

Each of these types of analysis considers systematic review to be the highest level of evidence and expert opinion to be the least reliable.  Apparently, medical science considers expert opinion to be as reliable as eyewitnesses are to attorneys.

Each medical discipline has its own standards with respect to the levels of evidence.

There are two important times for a litigator to understand these various levels: when she is shopping for an expert witness; and when cross-examining the opposition’s expert witness. Keep in mind that just because an expert’s opinion satisfies the Daubert test doesn’t necessarily mean that it represents the highest level of scientific reliability. It is a delicious moment when you are able to tell the jury that your expert’s opinion is based upon a higher level of scientific reliability than your opponent’s expert witness testimony.

Some medical journals are providing a score of evidence levels in their summary of various studies. For example, The Journal of Bone & Joint Surgery announced that beginning in January 2003, it would include a “Level-of-Evidence Rating” in all of its clinical articles. This has become increasingly popular in the medical community, typically provided by the leading journals within each medical discipline.

In our approach to experts and trials, we must avoid a “one-size-fits-all” approach to levels of evidence. Buttressing or discrediting evidence based upon the hierarchy provided by the leading experts within a discipline is an extraordinarily valuable tool.

For litigators, knowing that such a system exists may put you in an advantageous position when making your argument to a judge or a jury.

By: Rodney Warner, J.D.