Earlier this year, the Ninth Circuit decided a case that appears likely to have a dangerous impact on a range of defense litigation practice areas that involve statistics-based expert testimony. The opinion states that statistical models can ignore obviously important explanatory variables and still be deemed reliable and admissible. In particular, it says that a regression study of promotions that fails to include a measure of the applicant’s ability is reliable and admissible. Obrey v. Johnson, 400 F.3d 691 (9th Cir. 2005).
This article shows that a regression1 study that does not employ a proper set of explanatory variables does not survive fundamental scientific analysis and that such a study is neither reliable nor admissible under Daubert;2 and that, as an example of that general proposition, a regression study of promotions that fails to include a measure of the applicant’s ability is neither reliable nor admissible. The article concludes with an example of how these results can be used to control a wide range of cases where statistical models are used to establish damages or some other required element of the matter.
The analysis that follows is set in the context of three employment law cases, but that is merely serendipity and the principles discussed apply directly to a range of practice areas that rely on statistical and regression analysis. These include patent, antitrust, securities class action, products liability, pharmaceuticals, discrimination, voting rights, many cases with damage estimates, and many contracts cases. Improperly performed and then improperly admitted regression is a cornerstone of junk litigation in most of those practice areas and the article closes with an example that applies the techniques discussed here to a contract matter.
Obrey v. Johnson
In Obrey v. Johnson, 400 F.3d 691 (9th Cir. 2005), a Title VII Plaintiff proffered Rule 702 statistical testimony to show that defendant discriminated against Pacific-Americans in promotions. At the trial court level, Defendant argued that the statistician’s analysis did not include the relative qualifications of the applicants and was therefore flawed into unreliability. The District Court excluded the testimony and the jury returned a defense verdict.
The Ninth Circuit reversed the exclusion, saying that the Plaintiff’s statistical evidence may have failed to account for differences in qualifications, but that did not render it irrelevant or inadmissible.
The first step in analyzing Obrey is to note that the principles of disinterested statistical analysis reject the analysis of the Ninth Circuit. Fundamental statistical analysis requires that regression models be properly specified. In briefest description, a model is properly specified only if the model includes all relevant explanatory variables and excludes all irrelevant explanatory variables. See Kmenta, Elements of Econometrics 161 (McMillan Pub. Co. 1971). The essence of the statistical analysis applicable to Obrey is that misspecified models do not have the requisite characteristics that generate competent testing and error rate analysis; and that misspecified models do not meet the standards for peer reviewed journals, publication in which is fairly required as a precondition to general acceptance. In short, misspecified models fail all of Daubert’s admissibility factors, making the legal analysis straightforward: misspecified statistical models do not meet Daubert’s criteria for admissibility. Citations to econometrics treatises for these propositions abound and several are provided below; but first, another important case sets the stage for that discussion.
Sheehan v. Daily Racing Form, Inc., 104 F.3d 940 (7th Cir. 1997), is important for present purposes both because it announces the better rule on omitted variables and because in announcing that rule Judge Posner, a skilled analytic statistician in his own right, provides an intuitive discussion of why statistical and regression models that omit relevant explanatory variables are, and must be, inadmissible. This is an important counterpoint to Obrey for litigators charged with excluding unreliable statistical expert testimony in a variety of practice areas.
Sheehan v. Daily Racing Form, Inc.
In Sheehan v. Daily Racing Form, Inc., plaintiff Sheehan was a well-regarded older employee of a racing newspaper company that used manual layout procedures to generate its papers. Defendant Daily Racing Form purchased a second newspaper company that used computerized layout procedures and converted its operations to the computerized techniques.
In subsequent layoffs, Sheehan and most of the other older employees, age 48 and above were terminated, while most of the younger workers aged 42 and below, were retained. Sheehan brought suit for age discrimination and his expert proffered a statistical study that showed a strong correlation between age and the pattern of dismissal. The trial court admitted the testimony. The Seventh Circuit reversed and excluded the expert’s testimony, noting that the expert had failed to consider an obviously important variable, computer skill, as an explanatory variable in his analysis of terminations. The court noted that if Daily Racing Form had terminated employees that lack computer skills, and the older workers tended to lack computer skills, then a study that omitted computer skills as an explanatory variable would find a correlation between dismissal and age, whether age was a criteria for dismissal or not. Id. at 942. While the opinion does not identify the type of statistical analysis employed, this failure to include an essential variable in the analysis is an example of the misspecification problem detailed in the econometrics literature. When a regression model omits explanatory variables that are correlated with included explanatory variables, the regression coefficients and their tests and error rate calculations lose the desirable properties that make the law deem them reliable. This is a prime example of why regression that omits an important variable must be excluded by the gatekeeper judge rather than being admitted and going to the weight of the evidence. When important explanatory variables are omitted, the statistical analysis is unreliable. Regression and statistical analyses that omit important explanatory variables reach inaccurate conclusions and appear to be saying things that they are not saying. They not only mislead, they lack the capacity to inform.
Legal View of Regression and Statistical Analysis
Properly executed regression studies apparently meet all of the Daubert criteria: they perform tests and specify the error rates associated with those tests. They are pervasively published in the peer-reviewed scientific journals of a wide range of scientific disciplines and, properly executed, are a generally accepted scientific research technique in dozens of those disciplines. Regression is widely used in a range of non-litigation settings for purely scientific purposes.
Of course, the fact that properly executed regression studies apparently meet all of the Daubert criteria makes “properly executed” the battle ground of admissibility. The Ninth Circuit holds that an expert can omit central variables and still have his work be considered properly executed. The scientific community disagrees with this pronouncement and the balance of this article discusses how lawyers opposing error-ridden regression analysis can exclude that testimony based upon its scientific merits (or lack thereof). These scientific merits begin with whether the expert met the requirements of the regression model.
Lawyering Regression Analysis under Daubert
For lawyers, the central scientific point on regression analysis is that if (and only if) the regression model is properly constructed will the regression estimators have a set of desirable properties that allow statisticians and economists to perform the testing and error rate analysis that is required under Daubert for admissibility in federal courts. Symmetrically, if counsel can establish that the proffered regression model’s requirements have been substantially violated, the scientific basis of the testimony is discredited and the testimony loses evidentiary reliability.
There are two regression problems common in the cases that stem from the substantial violation of the requirements of the regression model: model misspecification and errors in the variables. Obrey is an example of model misspecification, the more complex of the two. A regression model is misspecified if the analyst has, for example, modeled termination rates as depending on age, when those termination rates could depend on computer skill.
Statisticians say that a model is “misspecified” if the true relationship between the two variables of interest is given by one equation, but the economist models the relationship using a different equation that excludes some of the important variables. Kmenta, at 391-405 (discussing model specification and econometric tests to determine if a model is misspecified); see also Judge et al., The Theory and Practice of Econometrics at 407-46 (John Wiley & Sons 1980) (providing an overview of regression model specification tests). Regression estimates from misspecified models are considered scientifically unreliable. This is an important consideration in a range of courtroom situations.
Regression studies that meet the assumptions of the appropriate regression model have a set of desirable characteristics (best, linear, unbiased (BLUE) and consistent), Kmenta at 161, that indicate that testing and error rate analysis done with them should meet the Daubert reliability standards. Tests done with parameters estimated by misspecified models or with inaccurately measured data fail statistically. Therefore, at a minimum they fail the testing, error rate and general acceptance criteria of Daubert. It cannot be over-emphasized that statistical analysis that fails the Daubert standard because it fails statistically should be excluded not simply because it fails to meet a technicality that the Supreme Court has imposed. Statistical analysis that fails the Daubert test for this reason should be excluded from evidence because it is wrong.
The balance of this article discusses a contract law matter that illustrates how a lawyer can control matters that rely on statistical analysis by commanding the statistical analysis.
Controlling Statistics–Based Litigation with Daubert: An Example from Contract Law
Regression based litigation is everywhere and often uses very sophisticated varieties of regression analysis. A recent contracts matter that relied on a complex regression technique shows how the requirements (statisticians call them assumptions) of the regression model can be used to dispose of litigation.
In this contracts matter the plaintiff proffered a damages expert, Dr. Noll, who used a specialized form of regression that he labeled “Cox Regression.” He explained that Cox Regression is used when the available data do not conform to the requirements of the standard regression model but do conform to a slightly more lax set of requirements. The expert was a Dean and full Professor at a major research university and he claimed that he was using the model in accord with the generally accepted standards of his profession. He even offered the expert opinion that his analysis satisfied Daubert.
Used properly, Cox Regression seems likely to meet the Daubert standards, but defense experts in this contract matter identified several errors in the expert’s methods, including his choice of regressors (explanatory variables), and his systematic errors in measuring the data he relied upon. However, plaintiff’s expert, Dr. Noll, had rationalizations for these errors. Explaining the errors required highly technical, graduate level statistics that would take almost any judge or lawyer beyond the limits of their understanding. That is especially true when the statistics are explained in the nomenclature of experts: standard errors, F-tests, t-tests, z-scores and p-values; one-tail and two-tail tests; consistent and inconsistent estimates and estimators, and so forth. There are intuitive ways of explaining many standard regression and statistics concepts to non-statisticians, but Cox Regression is especially complex and is possessed of few intuitive concepts. To complicate matters, a LEXIS search for “Cox Regression” yielded no hits.
But there are many ways to debunk regression within this article’s context of commanding litigation by commanding statistics. In this instance defense counsel noticed that Cox Regression looks very much like another regression model known as the Proportional Hazards Model, and that there was one reported case on the Proportional Hazards Model.
In Coates v. Johnson & Johnson, 1982 WL 285 (N.D. Ill. 1982), aff’d, 756 F.2d 524 (7th Cir. 1985), plaintiff relied upon the Proportional Hazards Model to establish an essential element of his case. The defense proffered Professor George Neumann of the University of Chicago Business School, one of the developers of the model, who outlined the failures of plaintiff’s expert to meet the requirements (assumptions) of the Proportional Hazards Model and the resultant unreliability of the plaintiff’s expert’s methods. The court excluded the testimony of the plaintiff’s expert.
Now, the list of errors condemned by Dr. Neumann and the Seventh Circuit is very similar to the list of errors made by Dr. Noll in this contracts example. But the fact that Dr. Noll made a series of errors that had been condemned by the court in Coates was only the beginning of Coates’ usefulness: Dr. Noll’s curriculum vitae indicated that when he earned his Ph.D. at the University of Iowa, Dr. Neumann, (the prevailing expert in Coates), was a senior econometrics Professor at the University of Iowa. Defense counsel was able to argue that not only did expert Noll’s Cox Regression analysis fail Daubert, it apparently would not even pass the final exam in Professor Neumann’s Econometrics class. The case settled immediately after briefing these issues.
Maximizing clients’ interests often require advocates to undertake the complex tasks of discrediting experts’ statistical or econometric models directly, but sometimes statistically informed lawyering provides easier and more effective avenues for excluding flawed regression testimony. Challenging statistical testimony by applying learned statistical analysis to Daubert issues requires an additional set of tools, but it is a highly cost effective litigation strategy, able to control substantial litigation with a modest investment of time and expense.
Stephen Mahle is a scientifically trained lawyer who concentrates his practice in litigating Daubert and expert testimony issues for insurance companies and their outside counsel. He has a doctorate in economics, has been a finance professor at several major universities, is webmaster of daubertexpert.com, and lectures and publishes regularly on Daubert and expert testimony issues. He can be reached at firstname.lastname@example.org, or
[PRINTER FRIENDLY VERSION]