# The Limits of Statistical Methodology: Why A “Statistically Significant” Number of Published Scientific Research Findings are False, #4.

# By Joseph Wayne Smith

# ***

**TABLE OF CONTENTS**

**1. **Introduction

**2.** Troubles in Statistical Paradise

**4.** The Limits of Probability Theory

**5. **Conclusion

# ***

The essay that follows below will be published in four installments; this is the fourth and final installment.

But you can also download and read or share a .pdf of the complete text of this essay, including the REFERENCES, by scrolling down to the bottom of this post and clicking on the **Download **tab.

# ***

**4. The Limits of Probability Theory**

There are many unsolved logical problems facing probability theory, especially involving infinite events (Hild, 2000; Shackel, 2007; Hájek, 1997, 2003, 2007). For example, what is the probability of an infinite sequence of heads tossed with an unbiased coin (Williamson, 2007)? Assume that the coin is “fair” by hypothesis. Multiplying the conjunctive probabilities leads to a sequence converging to 0 probability. Yet, an infinite sequence of heads is one logical possibility. Williamson argues that the use of infinitesimal probabilities does not resolve the contradiction:

*Cantor showed that some natural, apparently compelling forms of reasoning fail for infinite sets. This moral applies to forms of probabilistic and decision-theoretic reasoning in a more radical way than may have been realised. Infinitesimals do not solve the problem.* (Williamson, 2007: p. 179)

Another relevant problem is that of the definition of conditional probability as a ratio of unconditional probabilities (Hájek, 2003):

*Pr(A/B) = Pr(A&B), Pr(B) > 0 *

*Pr(B)*.

Hájek notes that that zero probability events are not necessarily impossible and can be of real scientific interest. He points out that Kolmogorov deals with this problem by analyzing conditional probability as a random variable. But even here there are problems because conditional probabilities can be defined in situations where the ratio is undefined because Pr(A&B) and Pr(B) are undefined. For example, if there is an urn with 90 red balls and 10 white balls, well mixed, the probability of drawing a red ball given that a ball is drawn at random is 0.9. However, the ratio analysis gives:

*Pr (X draws a red ball & X draws a ball at random from the urn)*

*Pr (X draws a ball at random from the urn)*

which does not have a defined numerator nor denominator (Hájek,2007).

Apart from these logical problems facing probability, one of the most important unsolved philosophical/methodological problems involving probabilities is the reference class problem: any sentence, event, or proposition can be classified in various ways; hence the probability of the sentence, event, or proposition, is dependent upon the classification (Colyvan et al., 2001; Kaye, 2004; Pardo, 2007; Colyvan & Regan, 2007; Rhee, 2007; Allen & Pardo, 2007a). The reference problem is not merely a problem for probabilistic evidence but as Roberts explains, is more general:

*Every factual generalisation implies a reference class, and this in turn entails that the reference class problem is an inescapable concomitant of inferential reasoning and fact-finding in legal proceedings. *(Roberts, 2007: p. 245)

Nevertheless, the problem has frequently been discussed in the narrower context of probability problems by leading theorists such as John Venn (Venn, 1876) and Hans Reichenbach (Reichenbach, 1949: p. 374). Although the problem has been regarded by many inductive logicians as providing a decisive refutation of the frequentist interpretation of probability, the reference problem also arises for classical, logical, propensity and subjectivist Bayesian interpretation as well (Hájek, 2007). The reference class problem has also been discussed in a legal context, and if the problem turns out to be insuperable for one area of human cognitive activity, then this establishes a general problem.

The reference class problem has been discussed in the. jurisprudential literature, in the case of *United States v Shonubi* (1992, 1995, 1997). A Nigerian citizen, Charles Shonubi, was convicted of smuggling heroin into New York by the Kennedy airport. Shonubi had made seven previous drug-smuggling trips. Since sentencing is based on the total quantity of drugs smuggled, the prosecution estimated the quantity of heroin smuggled on those prior trips. In the trial, the US Second Circuit Court of Appeals did not allow the statistical evidence. Consequently, Shonubi was prosecuted on the basis of the actual quantity of drugs in his possession at the time he was arrested. The statistical data were based upon estimates using the reference class of other Nigerians smuggling heroin into Kennedy airport using Shonubi’s method of ingesting balloons containing heroin paste. But if use were made of a different reference class to which Shonubi also belonged, a conflicting probability would have been be obtained.

Ronald J. Allen and Michael S. Pardo, in their paper “The Problematic Value of Mathematical Models of Evidence” (Allen & Pardo, 2007a), have concluded that the reference class problem shows the epistemological limits of mathematical models of evidence for, at least, law:

*The reference-class problem demonstrates that objective probabilities based on a particular class of which an item of evidence is a member cannot typically (and maybe never) capture the probative value of that evidence for establishing facts relating to a specific event. The only class that would accurately capture the ‘objective’ value would be the event itself, which would have a probability of one or zero, respectively.* (Allen & Pardo, 2007a: p. 114).

There may be “practical” solutions to the reference class problem, because people make statistical inferences regularly in daily life (Cheng, 2009, 2089). Nevertheless, the theoretical issue, like that of making inductive inferences, is to show that such inferences are *justified*. Thus, Mike Redmayne concludes that the reference class problem is not intractable, but merely shows that probability judgments are relative to our evidence pool (Redmayne, 2008: p. 288). Agreed: but the issue in the debate is whether or not a rationally justified choice can be made between *prima facie* plausible, but conflicting probabilities, generated from different reference classes. Saying that our probability judgments are relative to our evidence pool, is true, but in fact only restates the problem: what is the “correct” evidence pool?

**5. Conclusion**

In this essay, I have examined the question raised by John Ioannidis, of why most published research findings, primarily in the social and biomedical sciences, are false. There are many reasons for this, such as small sample sizes, and even fraud, which when exposed lead to substantial numbers of papers being retracted. There is also the quality control issue as well, whereby journals are reluctant to publish refutations of papers, so that there is a build-up of intellectual “rubbish,” just as a creek might get clogged up with weeds. However, as I discussed above, the crisis of statistical methodology is also genuinely important, for if the foundational methodologies are flawed, then we cannot have reasoned faith in the conclusions reached. And that is precisely the situation in disciplines like psychology, for example, as far as much or even most empirical scientific research in those disciplines goes. Therefore, a constructive or healthy skepticism about empirical science is strongly recommended. At the same time, however, since constructive or healthy skepticism is itself a product of human rationality, then a cautious optimism about human rationality is also strongly recommended.

**REFERENCES**

(Abelson, 1997). Abelson, R.P. “On the Surprising Longevity of Flogged Horses: Why There is a Case for the Significance Test.” *Psychological Science* 8: 12–15.

(Albert, 2002). Albert, M. “Resolving Neyman’s Paradox.” *British Journal for the Philosophy of Science* 53: 69–76.

(Allen & Pardo, 2007a). Allen, R.J & Pardo, M.S. “The Problematic Value of Mathematical Models of Evidence.” *Journal of Legal Studies* 36: 107–140.

(Allen & Pardo, 2007b). Allen, R.J. & Pardo, M.S. “Probability, Explanation and Inference: A Reply.” *International Journal of Evidence and Proof* 11: 307–317.

(Allen, 1996–1997). Allen, R.J. “Rationality, Algorithms and Juridical Proof: A Preliminary Inquiry.” *International Journal of Evidence and Proof* 1: 254–275.

(Amrhein et al., 2019). Amrhein, V. et al. “Scientists Rise Up Against Statistical Significance,” *Nature*. 20 March. Available online at URL = <https://www.nature.com/articles/d41586-019-00857-9.>

(Anderson et al., 2000). Anderson, D.R. et.al. “Null Hypothesis Testing: Problems, Prevalence and an Alternative.” *Journal of Wildlife Management* 64: 912–923.

(Armstrong, 2007). Armstrong, J.S. “Significance Tests Harm Progress in Forecasting.” *International Journal of Forecasting* 23: 321–327.

(Bakan, 1966). Bakan, D. “The Effect of Significance in Psychological Research.” *Psychological Bulletin* 66: 423–437.

(Banasiewicz, 2005). Banasiewicz, A.D. “Marketing Pitfalls of Statistical Significance Testing.” *Marketing Intelligence and Planning* 23: 515–528.

(Barnes, 1999). Barnes, E.C. “The Quantitative Problem of Old Evidence.” *British Journal for the Philosophy of Science *50: 249–264.

(Begley & Ellis, 2012). Begley, C.G. & Ellis, L.M. “Drug Development: Raise Standards for Preclinical Cancer Research.”* Nature* 483: 531–533.

(Bem, 2011). Bem, D. J. “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect.” *Journal of Personality and Social Psychology* 100: 407–425.

(Berger & Sellke, 1987). Berger, J.O. & Sellke, T. “Testing a Point Null Hypothesis: The Irreconcilability of *P* Values and Evidence.” *Journal of the American Statistical Association* 82: 112–122.

(Berger & Berry, 1988). Berger, J.O. & Berry, D.A. “Statistical Analysis and the Illusion of Objectivity.” *American Scientist* 76: 159–165.

(Berger et al., 1997). Berger, J. et.al. “Unified Frequentist and Bayesian Testing of a Precise Hypothesis.” *Statistical Science* 12: 133–160.

(Bergman & Moore, 1991). Bergman, P. and Moore, A. “Mistrial by Likelihood Ratio: Bayesian Analysis Meets the F-Word.” *Cardozo Law Review* 13: 589–619.

(Berkson, 1938). Berkson, J. “Some Difficulties of Interpretation Encountered in the Application of the Chi-Squared Test.” *Journal of the American Statistical Association* 33: 526–536.

(Button, et al., 2013). Button, K.S. et al. “Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience.”* Nature Reviews/Neuroscience* 14: 365–376.

(Carney et al., 2010). Carney, D.R. et al., “Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance.” *Psychological Science* 21: 1363–1368.

(Carver, 1978). Carver, R. “The Case Against Statistical Significance Testing.” *Harvard Educational Review* 48: 378–399.

(Cheng, 2009). Cheng, E.K. “A Practical Solution to the Reference Class Problem.” *Columbia Law Review* 109: 2081–2105.

(Chow, 1988). Chow, S.L. “Significance Test or Effect Size?” *Psychological Bulletin* 103: 105–110.

(Chow, 1996). Chow, S.L. *Statistical Significance: Rationale, Validity and Utility*. London: Sage.

(Chow, 1998). Chow, S.L. “Precis of *Statistical Significance: Rationale, Validity and Utility.*” *Behavioral and Brain Sciences *21: 169–239.

(Cohen, 1990). Cohen, J. “Things I Have Learned (So Far).” *American Psychologist* 45: 1304–1312.

(Cohen, 1994). Cohen, J. “The Earth is Round (p < 0.05).” *American Psychologist* 49: 997–1003.

(Colman, 2003). Colman, A.M. “Cooperation, Psychological Game Theory, and Limitations of Rationality in Social interaction.” *Behavioral and Brain Sciences* 26: 139–198.

(Colyvan et al., 2001). Colyvan, M. et al. “Is It a Crime to Belong to a Reference Class?” *Journal of Political Philosophy* 9: 168–181.

(Colyvan & Regan, 2007). Colyvan, M. & Regan, H.M. “Legal Decisions and the Reference Class Problem.” *International Journal of Evidence and Proof *11: 274–285.

(Cumming, 2012). Cumming, G. *Understanding the New Statistics: Effect Sizes, Confidence* *Intervals, and Meta-Analysis*. London: Routledge.

(Cumming, 2014). Cumming, G. “The New Statistics: Why and How.” *Psychological Science* 25: 7–29.

(de Long & Lang, 1992). de Long, J.B. and Lang, K. “Are All Economic Hypotheses False?” *Journal of Political Economy* 100: 1257–1272.

(Diekmann, 2011). Diekmann, A. “Are Most Published Research Findings False?” *Journal of* *Economics and Statistics* 231: 628–635.

(Dienes, 2011). Dienes, Z. “Bayesian Versus Orthodox Statistics; Which Side Are You On?” *Perspectives on Psychological Science* 6: 274–290.

(Earman, 1989). Earman, J. “Old Evidence, New Theories: Two Unresolved Problems in Bayesian Confirmation Theory.” *Pacific Philosophical Quarterly *70: 323–340.

(Eells, 1990). Eells, E. *Bayesian Problems of Old Evidence in Scientific Theories *Minneapolis MN: Univ. of Minnesota Press.

(Eggleston, 1991). Eggleston, R. “Similar Facts and Bayes’ Theorem.” *Jurimetrics Journal* 31: 275–287.

(Everett & Earp, 2015). Everett, J.A.C. & Earp, B.D. “A Tragedy of the (Academic) Commons: Interpreting the Replication Crisis in Psychology as a Social Dilemma for Early-Career Researchers.” *Frontiers in Psychology* 6. 5 August. Available online at URL = <https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2015.01152/full>.

(Fidler et al., 2004). Fidler, F. et al. “Editors Can Lead Researchers to Confidence Intervals, But They Can’t Make Them Think: Statistical Reform Lessons from Medicine.” *Psychological Science* 15: 119–126.

(Fienberg et al., 1995). Fienberg, S.E. et.al. “Understanding and Evaluating Statistical Evidence in Litigation.” *Jurimetrics Journal* 36: 1–32.

(Fisher, 1960). Fisher, R.A. *The Design of Experiments* 7th edn., New York: Hafner.

(Freedman et al., 2015). Freedman, L.P. et al. “The Economics of Reproducibility in Preclinical Research.” *PLOS Biology* 13. 9 June. Available online at URL = <https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165>.

(Freeman, 2010). Freedman, D.H. *Wrong.* New York: Little Brown and Company.

(Frick, 1996). Frick, R.W. “The Appropriate Use of Null Hypothesis Testing.” *Psychological Methods* 1: 379–390.

(Garber, 1983). Garber, D. “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In J. Earman (ed.), *Testing Scientific Theories. *Minneapolis MN: Univ. of Minnesota Press. Pp. 99–132.

(Gelman & Stern, 2006). Gelman, A. & Stern, H. “The Difference Between ‘Significant’ and ‘Not Significant’ is Not Itself Statistically Significant.” *American Statistician* 60: 328–331.

(Gelman, 2014). Gelman, A. “The Fallacy of Placing Confidence in Confidence Intervals.” Available online at URL = <http://andrewgelman.com/2014/12/11/fallacy-placing-confidence-confidence-intervals/.>

(Gigerenzer, 1998). Gigerenzer, G. “We Need Statistical Thinking, Not Statistical Rituals.” *Behavioral and Brain Sciences* 21: 199–200.

(Gigerenzer, 2004). Gigerenzer, G. “Mindless Statistics.” *Journal of Socio-Economics* 33: 587–606.

(Gigerenzer, 2018). Gigerenzer, G. “Statistical Rituals: The Replication Delusion and How We Got There.” *Advances in Methods and Practices in Psychological Science* 1, 2: 198–218.

(Glass et al., 1981). Glass, G.V. et al. *Meta-Analysis in Social Research*. Beverly Hills CA: Sage Publications.

(Gliner et al., 2002). Gliner, J. et al. “Problems with Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say?” *Journal of Experimental Education* 71: 83–92.

(Glymour, 1980). Glymour, C. *Theory and Evidence. *Princeton NJ:Princeton Univ. Press.

(Godlee et al., 1998). Godlee, F. et al. “Effect on the Quality of Peer Review of Blinding Reviewers and Asking Them to Sign Their Reports: A Randomized Control Trial.” *JAMA* 280: 237–240.

(Goodman, 1993). Goodman, S.N. “*P* Values Hypothesis Tests, and Likelihood: Implications for Epidemiology of a Neglected Historical Debate.” *American Journal of Epidemiology* 137: 485–496.

(Grant, 1962). Grant, D.A. “Testing the Null Hypothesis and the Strategy and Tactics of Investigating Theoretical Models.” *Psychological Review* 69: 54–61.

(Greenland, 2011). Greenland, S. “Null Misinterpretation in Statistical Testing and Its Impact on Health Risk Assessment.” *Preventive Medicine* 53: 225–228.

(Gunn et al., 2016). Gunn, L.J. et al. “Too Good to be True: When Overwhelming Evidence Fails to Convince.” *Proceedings of the Royal* *Society A* 472. 23 March. Available online at URL = <https://royalsocietypublishing.org/doi/10.1098/rspa.2015.0748>.

(Gunter& Tong, 2016–2017). Gunter, B. & Tong, C. “A Response to the ASA Statement on Statistical Significance and *P*-Values.” *NV-ASA Newsletter* 14: 1–3.

(Guttman, 1985). Guttman, L. “The Illogic of Statistical Inference for Cumulative Science.” *Applied Stochastic Models and Data Analysis* 1: 3–10.

(Hagan, 1997). Hagan, R.L. “In Praise of the Null Hypothesis Statistical Test.” *American Psychologist* 52: 15–24.

(Hájek, 1997). Hájek, A. “Mises Redux-Redux: Fifteen Arguments against Finite Frequentism.” *Erkenntnis* 45: 209–227.

(Hájek, 2003). Hájek, A. “What Conditional Probability Could Not Be.” *Synthese* 137: 273–323.

(Hájek, 2007). Hájek, A. “The Reference Class Problem is Your Problem Too.” *Synthese* 156: 563–585.

(Hanna, 2023a). Hanna, R. “Empirical Science with Uncertainty but Without Reproducibility.” *Against Professional Philosophy*. December 10. Available online at URL = <https://againstprofphil.org/2023/12/10/empirical-science-with-uncertainty-but-without-reproducibility/>.

(Hanna, 2023b). Hanna, R. “The End of Peer Review and the Matrix of Ideas.” *Against Professional Philosophy*. Available online at URL = <https://againstprofphil.org/2023/12/17/the-end-of-peer-review-and-the-matrix-of-ideas/>.

(Harlow, et al., eds, 1997). Harlow, L.L. et al. eds. *What If There Were No Significance Tests?* Mahwah NJ: Lawrence Erlbaum.

(Harman, 1965). Harman, G. “Inference to the Best Explanation.” *Philosophical Review* 74: 88–95.

(Harris, 1997). Harris, R. J. “Significance Tests Have Their Place.” *Psychological Science* 8: 8–11.

(Hartshorne et al., 2012). Hartshorne, J. et al. “Tracking Replicability as a Method of Post-Publication Open Evaluation.” *Frontiers in* *Computational Neuroscience* 6: 1–13.

(Higginson & Munafò, 2016). Higginson, A.D. and Munafò, M.R. “Current Incentives for Scientists Lead to Underpowered Studies with Erroneous Conclusions.” *PLOS Biology* 14. 10 November. Available online at URL = <https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2000995>.

(Hild, 2000). Hild, M. “Trends in the Philosophy of Probability.” *Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics* 31: 419–422.

(Hill, 1965). Hill, A.B. “The Environment and Disease: Association or Causation?” *Proceedings of the Royal Society of Medicine* 58: 295–300.

(Hodgson, 1995). Hodgson, D. “Probability: The Logic of the Law — A Response.” *Oxford Journal of Legal Studies* 15: 51–68.

(Holman et al., 2001). Holman, C.L. et al. “A Psychometric Experiment in Causal Inference to Estimate Evidential Weights used by Epidemiologists.” *Epidemiology* 12: 246–255.

(Homburg, 1987). Humburg, J. “The Bayes Rule is Not Sufficient to Justify or Describe Inductive Reasoning.” *Erkenntnis *26: 379–390.

(Horton, 2015). Horton, R. “Offline: What is Medicine’s 5 Sigma?” *The Lancet* 385: 1380.

(Howson, 1991). Howson, C. “The Old Evidence Problem.” *British Journal for the Philosophy of Science *42: 547–555.

(Howson & Urbach, 2006). Howson, C. & Urbach, P. *Scientific Reasoning: The Bayesian Approach. *La Salle IL:Open Court.

(Hubbard & Bayarri, 2003). Hubbard, R. & Bayarri, M. J. “Confusion Over Measures of Evidence (P’s) Versus Errors (α’s) in Classical Statistical Testing (With Comments).” *American Statistician* 57: 171–182.

(Hubbard et al., 2019). Hubbard, R. et al. “The Limited Role of Formal Statistical Inference in Scientific Inference.” *American Statistician* 73 (S1): 91–98.

(Humphreys, 1988). Humphreys, P. “Non-Nietzschean Decision Making.” In J.H. Fetzer (ed.), *Probability and Causality*. Dordrecht: Kluwer. Pp. 253–268.

(Hunter, 1997). Hunter, J.E. “Needed: A Ban on the Significance Test.” *Psychological Science* (Special Section) 8: 3–7.

(Hurlbert & Lombardi, 2009). Hurlbert, S. H. & Lombardi, C. M. “Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the NeoFisherian.” *Annales Zoologica Fennici* 46: 311–349.

(Hurlbert, et al., 2019). Hurlbert, S. H. et al. “Coup de Grâce for a Tough Old Bull: ‘Statistically Significant’ Expires.” *American Statistician* 73 (S1): 352–357.

(Hyliand & Zeckhauser, 1979). Hyliand, A. and Zeckhauser, R. “The Impossibility of Bayesian Group Decision Making with Separate Aggregation of Beliefs and Values.” *Econometrica* 47: 1321–1336.

(Ioannidis & Trikalinos, 2005). Ioannidis, J.P.A. & Trikalinos, T. A. “Early Extreme Contradictory Estimates May Appear in Published Research: The Proteus Phenomenon in Molecular Genetics Research and Randomized Trials.” *Journal of Clinical Epidemiology* 58: 543–549.

(Ioannidis, 2005a). Ioannidis, J.P.A. “Why Most Published Research Findings are False.” *PLOS* Medicine 2, 8: 0696–0701.

(Ioannidis, 2005b). Ioannidis, J.P.A. “Contradictions in Highly Cited Research — Reply.” *JAMA* 294: 2695–2696.

(Ioannidis, 2008). Ioannidis, J.P.A. “Why Most Discovered True Associations are Inflated.” *Epidemiology* 19: 640–648.

(Ioannidis, 2014). Ioannidis, J.P.A. et al. “Publication and Other Reporting Biases in Cognitive Sciences: Detection, Prevalence, and Prevention.” *Trends in Cognitive Sciences* 18, 5: 235–241.

(Jefferson et al., 2002). Jefferson, T. et al. “Effects of Editorial Peer Review: A Systematic Review.” *JAMA* 287: 2784–2786.

(Johnson, 1999). Johnson, D.H. “The Insignificance of Statistical Significance Testing.” *Journal of Wildlife Management* 63: 763–772.

(Johnson, 2005). Johnson, D.H. “What Hypothesis Tests are Not: A Response to Colgrave and Ruxton.” *Behavioral Ecology *16: 323–324.

(Kaplan, 1989). Kaplan, M. “Bayesianism without the Black Box.” *Philosophy of Science *56: 48–69.

(Kaye, 1986). Kaye, D.H. “Is Proof of Statistical Significance Relevant?” *Washington Law Review* 61: 1333–1365.

(Kaye, 2004). Kaye, D. H. “Logical Relevance: Problems with the Reference Population and DNA Mixtures in *People v Pizarro.*” *Law, Probability and Risk *3: 211–220.

(Kelly & Glymour, 2004). Kelly, K.T. & Glymour, C. “Why Probability Does Not Capture the Logic of Scientific Justification.” In C. Hitchcock (ed.), *Contemporary Debates in Philosophy of Science*. Malden MA: Blackwell. Pp. 94–114.

(Kelly & Schulte, 1995). Kelly, K. and Schulte, O. “The Computable Testability of Theories with Uncomputable Predictions.” *Erkenntnis* 42: 29–66.

(Kirk, 1996). Kirk, R.F. “Practical Significance: A Concept Whose Time Has Come.” *Educational and Psychological Measurement *56: 746–759.

(Kyburg, 1978). Kyburg, H. “Subjective Probability: Criticisms, Reflections and Problems.” *Journal of Philosophical Logic *7: 157–180.

(Kyburg, 1993). Kyburg, H. “The Scope of Bayesian Reasoning,” in D. Hull et al. (eds.), *Philosophy of Science Association 1992*. East Lansing MI: Philosophy of Science Association. Pp. 139–152.

(Ligertwood, 1996–1997). Ligertwood, A. “Bayesians and the World Out There.” *International Journal of Evidence and Proof* 1: 321–325.

(Lindley, 1957). Lindley, D.V. “A Statistical Paradox.” *Biometrika *44: 187–192.

(Loftus, 1991). Loftus, G.R. “On the Tyranny of Hypothesis Testing in the Social Sciences.” *Contemporary Psychology *36: 102–105.

(Lykken, 1968). Lykken, D.T. “Statistical Significance in Psychological Research.” *Psychological Bulletin* 70: 151–159.

(Martin, 1992). Martin, B. “Scientific Fraud and the Power Structure of Science.” *Prometheus* 10: 83–98.

(McCloskey, 1986). McCloskey, D. “Why Economic Historians Should Stop Relying on Statistical Tests of Significance and Lead Economists and Historians into the Promised Land.” *Newsletter of the Cliometrics Society* 2: 5–7.

(McShane, 2019). McShane, B.B. “Abandon Statistical Significance.” *American Statistician* 73 (S1): 235–245.

(Merton, 1968). Merton, R.K. “The Matthew Effect in Science.”* Science* 159: 56–63.

(Milne, 1991). Milne, P. “Annabel and the Bookmaker: An Everyday Tale of Bayesian Folk.” *Australasian Journal of Philosophy *69: 98–102.

(Moonesinghe et al., 2007). Moonesinghe, R. et al. “Most Published Research Findings are False — But a Little Replication Goes a Long Way.” *PLoS Medicine* 4, 2: 0218–0221.

(Morey et al., 2014). Morey, R.D. et al. “Why Hypothesis Tests are Essential for Psychological Science: A Comment on Cumming.” *Psychological Science* 25: 1289–1290.

(Morey et al., 2016). Morey, R.D. et al. “The Fallacy of Placing Confidence in Confidence Intervals.” *Psychological Bulletin and Review* 23: 103–123.

(Morgan, 2003). Morgan, P.L. “Null Hypothesis Significance Testing: Philosophical and Practical Considerations of a Statistical Controversy.” *Exceptionality* 11: 209–221.

(Morrison & Henkel eds., 1970). Morrison, D.E. and Henkel, R.E. (eds.), *The Significance Test Controversy. *Chicago: Aldine.

(Nickerson, 2000). Nickerson, R.S. “Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy.” *Psychological Methods* 5: 241–305.

(Norton, 2011). Norton, J. D. “Challenges to Bayesian Confirmation Theory.” In P.S. Bandyopadhyay and M. Forster (eds.), *Philosophy of Statistics: Vol. 7, Handbook of the Philosophy of Science*. Amsterdam: North Holland. Pp. 391–437.

(Nunnally, 1960). Nunnally, J. “The Place of Statistics in Psychology.” *Educational and Psychological Measurement* 20: 641–650.

(Oakes, 1986). Oakes, M. *Statistical Inference: A Commentary for the Social and Behavioral Sciences*. New York: Wiley.

(Open Science Collaboration, 2015). Open Science Collaboration. “Estimating the Reproducibility of Psychological Science.” *Science* 349: aac4716–1 — aac4716–8.

(Pardo, 2000). Pardo, M.S. “Juridical Proof, Evidence and Pragmatic Meaning: Toward Evidentiary Holism.” *Northwestern University Law Review* 95: 399–442.

(Pardo, 2007). Pardo, M.S. “Reference Classes and Legal Evidence.” *International Journal of Evidence and Proof* 11: 255–258.

(Peters & Ceci, 1982). Peters, D. & Ceci, S. “Peer-Review Practices of Psychological Journals: The Fate of Submitted Articles, Submitted Again.” *Behavioral and Brain Sciences* 5: 187–255.

(Pratt, 1987). Pratt, J.W. “Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence: Comment.” *Journal of the American Statistical Association* 82: 123–125.

(Rawling, 1999). Rawling, P. “Reasonable Doubt and the Presumption of Innocence: The Case of the Bayesian Juror.” *Topoi* 18: 117–126.

(Redmayne, 2003). Redmayne, M. “Objective Probability and the Assessment of Evidence.” *Law, Probability and Risk* 2: 275–294.

(Redmayne, 2008). Redmayne, M. “Exploring the Proof Paradoxes.” *Legal Theory* 14: 281–309.

(Reichenbach, 1949). Reichenbach, H. *The Theory of Probability*. Berkeley CA: Univ. of California Press.

(Rhee, 2007). Rhee, R.J. “Probability, Policy and the Problem of the Reference Class.” *International Journal of Evidence and Proof* 11: 286–291.

(Roberts, 2007). Roberts, P. “From Theory into Practice: Introducing the Reference Class Problem.” *International Journal of Evidence and Proof* 11: 243–254.

(Rosenthal & Rubin, 1985). Rosenthal, R. and Rubin, D.B. “Statistical Analysis: Summarizing Evidence Versus Establishing Facts.” *Psychological Bulletin* 97: 527–529.

(Rosnow & Rosenthal, 1989). Rosnow, R.L. & Rosenthal, R. “Statistical Procedures and the Justification of Knowledge in Psychological Science.” *American Psychologist* 44: 1267–1284.

(Rozeboom, 1960). Rozeboom, W.W. “The Fallacy of the Null-Hypothesis Significance Test.” *Psychological Bulletin* 57: 416–428.

(Schmidt & Hunter, 2002), Schmidt, F.L. and J.E. Hunter, J.E. “Are There Benefits from NHST?” *American Psychologist* 57: 65–66.

(Schmidt, 1991). Schmidt, F.L. “Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers.” *Psychological Methods* 1: 115–129.

(Selvin, 1957). Selvin, H.C. “A Critique of Tests of Significance in Survey Research.” *American Sociological Review* 22: 519–527.

(Shackel, 2007). Shackel, N. “Bertrand’s Paradox and the Principle of Indifference.” *Philosophy of Science* 74: 150–175.

(Shadish & Cook, 1999). Shadish, W. R. & Cook, T. D. “Comment — Design Rules: More Steps Towards a Complete Theory of Quasi-Experimentation.” *Statistical Science* 14: 294–300.

(Shafer, 1986). Shafer, G. “The Construction of Probability Arguments.” *Boston University Law Review* 66: 799–816.

(Shrader-Frechette, 2008). Shrader-Frechette, K. “Statistical Significance in Biology: Neither Necessary Nor Sufficient for Hypothesis Acceptance.” *Biological Theory* 3: 12–16.

(Shrout, 1997). Shrout, P. “Should Significance Tests Be Banned?” *Psychological Science* 8: 1–2.

(Simberloff, 1990). Simberloff, D. “Hypotheses, Errors and Statistical Assumptions.” *Herpetologica* 46: 351–357.

(Simmons, 2011). Simmons, J.P. “False Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” *Psychological Science* 22: 1359–1366.

(Simpson & Orlov, 1979–1980). Simpson, S. and Orlov, M. “Comment: An Application of Logic to the Law.” *University of New South Wales Law Journal* 3: 415–425.

(Smith, 2023). Smith, J. W. “Against the Academics: Peering at the Problem of Peer Review.” *Against Professional Philosophy*. June 18. Available online **HERE**.

(Smith, 1999). Smith, J.W. et al. *The Bankruptcy of Economics*. London: Macmillan.

(Smith & Smith, 2023a). Smith, J.W. & Smith, S. “Corruption, Falsity, and Fraud: The Epistemological Crisis of Professional Academic Research.” *Against Professional Philosophy*. 17 September. Available online **HERE**.

(Smith & Smith, 2023b). Smith, J. W. & Smith, S. “From Scientific Reproducibility to Epistemic Humility.” *Against Professional Philosophy*. 3 December. Available online at URL = <https://againstprofphil.org/2023/12/03/from-scientific-reproducibility-to-epistemic-humility/>.

(Smith, 2006). Smith, R. “Peer Review: A Flawed Process at the Heart of Science and Journals.” *Journal of the Royal Society of Medicine* 99: 178–182.

(Sowden, 1984). Sowden, L. “The Inadequacy of Bayesian Decision Theory,” *Philosophical Studies *45: 293–313.

(Stein, 1996–1997). Stein, A. “Judicial Fact-Finding and the Bayesian Method: The Case for Deeper Scepticism About Their Combination.” *International Journal of Evidence and Proof* 1: 25–47.

(Sterne & Smith, 2001). Sterne, J.A.C. and Smith, G.D. “Sifting the Evidence — What’s Wrong with Significance Tests? *British Medical Journal* 322: 226–231.

(Suppes, 2007). Suppes, P. “Where do Bayesian Priors Come From?” *Synthese* 156: 441–471.

(Tabarrok, 2005). Tabarrok, A. “Why Most Published Research Findings are False.” *Marginal Revolution*. 2 September. Available online at URL = <http://marginalrevolution.com/marginalrevolution/2005/09/why_most_publis.html.>.

*United States v Shonubi*, 802 F Supp 859 (EDNY, 1992).

*United States v Shonubi,* 998 F2d 84 (2d Cir., 1993).

*United States v Shonubi*, 895 F Supp 480 (EDNY, 1995).

*United States v Shonubi*, 962 F. Supp 370 (EDNY, 1997).

*United States v Shonubi*, 103 F3d 1085 (2d Cir, 1997).

(Van Fraassen, 1988). Van Fraassen, B.C. “The Problem of Old Evidence.” In D.F. Austin (ed.), *Philosophical Analysis*. Norwell: Kluwer. Pp. 153–165.

(Venn, 1876). Venn, J. *The Logic of Chance*. 2nd edn., London: Macmillan.

(Vogel, 2011). Vogel, G. “Scientific Misconduct: Psychologist Accused of Fraud on ‘Astonishing Scale’.” *Science* 334: 579.

(Wagner, 1997). Wagner, C.G. “Old Evidence and New Explanation.” *Philosophy of Science *64: 677- 691.

(Wasserstein & Lazar, 2016). Wasserstein, R. L. & Lazar, N.A. “The ASA Statement on *P*-Values: Context, Process, and Purpose.” *American Statistician* 70: 129–133.

(Williamson, 2007). Williamson, T. “How Probable is an Infinite Sequence of Heads?” *Analysis* 67: 173–180.

(Yong, 2015). Yong, E. “How Reliable Are Psychology Studies?” *The Atlantic*. 27 August. Available online at URL = <http://www.theatlantic.com/science/archive/2015/08/psychology-studies-reliability-reproducibility-nosek/4024661>.

(Ziliak & McCloskey, 2008). Ziliak, S.T. & McCloskey, D.N. *The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives.* Ann Arbor MI: Univ. of Michigan Press.

(Zynda, 1995). Zynda, L. “Old Evidence and New Theories.” *Philosophical Studies *77: 67–95.

# ***

# AGAINST PROFESSIONAL PHILOSOPHY REDUX 894

*Mr Nemo, W, X, Y, & Z, Monday 27 May 2024*

*Against Professional Philosophy** is a sub-project of the online mega-project **Philosophy Without Borders**, which is home-based on Patreon **here**.*

*Please consider becoming a patron!*