The Limits of Statistical Methodology: Why A “Statistically Significant” Number of Published Scientific Research Findings are False, #4.

Mr Nemo

19 min readMay 27, 2024

By Joseph Wayne Smith

***

1. Introduction

2. Troubles in Statistical Paradise

3. A Critique of Bayesianism

4. The Limits of Probability Theory

5. Conclusion

***

The essay that follows below will be published in four installments; this is the fourth and final installment.

But you can also download and read or share a .pdf of the complete text of this essay, including the REFERENCES, by scrolling down to the bottom of this post and clicking on the Download tab.

***

4. The Limits of Probability Theory

There are many unsolved logical problems facing probability theory, especially involving infinite events (Hild, 2000; Shackel, 2007; Hájek, 1997, 2003, 2007). For example, what is the probability of an infinite sequence of heads tossed with an unbiased coin (Williamson, 2007)? Assume that the coin is “fair” by hypothesis. Multiplying the conjunctive probabilities leads to a sequence converging to 0 probability. Yet, an infinite sequence of heads is one logical possibility. Williamson argues that the use of infinitesimal probabilities does not resolve the contradiction:

Cantor showed that some natural, apparently compelling forms of reasoning fail for infinite sets. This moral applies to forms of probabilistic and decision-theoretic reasoning in a more radical way than may have been realised. Infinitesimals do not solve the problem. (Williamson, 2007: p. 179)

Another relevant problem is that of the definition of conditional probability as a ratio of unconditional probabilities (Hájek, 2003):

Pr(A/B) = Pr(A&B), Pr(B) > 0
Pr(B).

Hájek notes that that zero probability events are not necessarily impossible and can be of real scientific interest. He points out that Kolmogorov deals with this problem by analyzing conditional probability as a random variable. But even here there are problems because conditional probabilities can be defined in situations where the ratio is undefined because Pr(A&B) and Pr(B) are undefined. For example, if there is an urn with 90 red balls and 10 white balls, well mixed, the probability of drawing a red ball given that a ball is drawn at random is 0.9. However, the ratio analysis gives:

Pr (X draws a red ball & X draws a ball at random from the urn)
Pr (X draws a ball at random from the urn)

which does not have a defined numerator nor denominator (Hájek,2007).

Apart from these logical problems facing probability, one of the most important unsolved philosophical/methodological problems involving probabilities is the reference class problem: any sentence, event, or proposition can be classified in various ways; hence the probability of the sentence, event, or proposition, is dependent upon the classification (Colyvan et al., 2001; Kaye, 2004; Pardo, 2007; Colyvan & Regan, 2007; Rhee, 2007; Allen & Pardo, 2007a). The reference problem is not merely a problem for probabilistic evidence but as Roberts explains, is more general:

Every factual generalisation implies a reference class, and this in turn entails that the reference class problem is an inescapable concomitant of inferential reasoning and fact-finding in legal proceedings. (Roberts, 2007: p. 245)

Nevertheless, the problem has frequently been discussed in the narrower context of probability problems by leading theorists such as John Venn (Venn, 1876) and Hans Reichenbach (Reichenbach, 1949: p. 374). Although the problem has been regarded by many inductive logicians as providing a decisive refutation of the frequentist interpretation of probability, the reference problem also arises for classical, logical, propensity and subjectivist Bayesian interpretation as well (Hájek, 2007). The reference class problem has also been discussed in a legal context, and if the problem turns out to be insuperable for one area of human cognitive activity, then this establishes a general problem.

The reference class problem has been discussed in the. jurisprudential literature, in the case of United States v Shonubi (1992, 1995, 1997). A Nigerian citizen, Charles Shonubi, was convicted of smuggling heroin into New York by the Kennedy airport. Shonubi had made seven previous drug-smuggling trips. Since sentencing is based on the total quantity of drugs smuggled, the prosecution estimated the quantity of heroin smuggled on those prior trips. In the trial, the US Second Circuit Court of Appeals did not allow the statistical evidence. Consequently, Shonubi was prosecuted on the basis of the actual quantity of drugs in his possession at the time he was arrested. The statistical data were based upon estimates using the reference class of other Nigerians smuggling heroin into Kennedy airport using Shonubi’s method of ingesting balloons containing heroin paste. But if use were made of a different reference class to which Shonubi also belonged, a conflicting probability would have been be obtained.

Ronald J. Allen and Michael S. Pardo, in their paper “The Problematic Value of Mathematical Models of Evidence” (Allen & Pardo, 2007a), have concluded that the reference class problem shows the epistemological limits of mathematical models of evidence for, at least, law:

The reference-class problem demonstrates that objective probabilities based on a particular class of which an item of evidence is a member cannot typically (and maybe never) capture the probative value of that evidence for establishing facts relating to a specific event. The only class that would accurately capture the ‘objective’ value would be the event itself, which would have a probability of one or zero, respectively. (Allen & Pardo, 2007a: p. 114).

There may be “practical” solutions to the reference class problem, because people make statistical inferences regularly in daily life (Cheng, 2009, 2089). Nevertheless, the theoretical issue, like that of making inductive inferences, is to show that such inferences are justified. Thus, Mike Redmayne concludes that the reference class problem is not intractable, but merely shows that probability judgments are relative to our evidence pool (Redmayne, 2008: p. 288). Agreed: but the issue in the debate is whether or not a rationally justified choice can be made between prima facie plausible, but conflicting probabilities, generated from different reference classes. Saying that our probability judgments are relative to our evidence pool, is true, but in fact only restates the problem: what is the “correct” evidence pool?

5. Conclusion

In this essay, I have examined the question raised by John Ioannidis, of why most published research findings, primarily in the social and biomedical sciences, are false. There are many reasons for this, such as small sample sizes, and even fraud, which when exposed lead to substantial numbers of papers being retracted. There is also the quality control issue as well, whereby journals are reluctant to publish refutations of papers, so that there is a build-up of intellectual “rubbish,” just as a creek might get clogged up with weeds. However, as I discussed above, the crisis of statistical methodology is also genuinely important, for if the foundational methodologies are flawed, then we cannot have reasoned faith in the conclusions reached. And that is precisely the situation in disciplines like psychology, for example, as far as much or even most empirical scientific research in those disciplines goes. Therefore, a constructive or healthy skepticism about empirical science is strongly recommended. At the same time, however, since constructive or healthy skepticism is itself a product of human rationality, then a cautious optimism about human rationality is also strongly recommended.

REFERENCES

(Abelson, 1997). Abelson, R.P. “On the Surprising Longevity of Flogged Horses: Why There is a Case for the Significance Test.” Psychological Science 8: 12–15.

(Albert, 2002). Albert, M. “Resolving Neyman’s Paradox.” British Journal for the Philosophy of Science 53: 69–76.

(Allen & Pardo, 2007a). Allen, R.J & Pardo, M.S. “The Problematic Value of Mathematical Models of Evidence.” Journal of Legal Studies 36: 107–140.

(Allen & Pardo, 2007b). Allen, R.J. & Pardo, M.S. “Probability, Explanation and Inference: A Reply.” International Journal of Evidence and Proof 11: 307–317.

(Allen, 1996–1997). Allen, R.J. “Rationality, Algorithms and Juridical Proof: A Preliminary Inquiry.” International Journal of Evidence and Proof 1: 254–275.

(Amrhein et al., 2019). Amrhein, V. et al. “Scientists Rise Up Against Statistical Significance,” Nature. 20 March. Available online at URL = <https://www.nature.com/articles/d41586-019-00857-9.>

(Anderson et al., 2000). Anderson, D.R. et.al. “Null Hypothesis Testing: Problems, Prevalence and an Alternative.” Journal of Wildlife Management 64: 912–923.

(Armstrong, 2007). Armstrong, J.S. “Significance Tests Harm Progress in Forecasting.” International Journal of Forecasting 23: 321–327.

(Bakan, 1966). Bakan, D. “The Effect of Significance in Psychological Research.” Psychological Bulletin 66: 423–437.

(Banasiewicz, 2005). Banasiewicz, A.D. “Marketing Pitfalls of Statistical Significance Testing.” Marketing Intelligence and Planning 23: 515–528.

(Barnes, 1999). Barnes, E.C. “The Quantitative Problem of Old Evidence.” British Journal for the Philosophy of Science 50: 249–264.

(Begley & Ellis, 2012). Begley, C.G. & Ellis, L.M. “Drug Development: Raise Standards for Preclinical Cancer Research.” Nature 483: 531–533.

(Bem, 2011). Bem, D. J. “Feeling the Future: Experimental Evidence for Anomalous Retroactive Influences on Cognition and Affect.” Journal of Personality and Social Psychology 100: 407–425.

(Berger & Sellke, 1987). Berger, J.O. & Sellke, T. “Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence.” Journal of the American Statistical Association 82: 112–122.

(Berger & Berry, 1988). Berger, J.O. & Berry, D.A. “Statistical Analysis and the Illusion of Objectivity.” American Scientist 76: 159–165.

(Berger et al., 1997). Berger, J. et.al. “Unified Frequentist and Bayesian Testing of a Precise Hypothesis.” Statistical Science 12: 133–160.

(Bergman & Moore, 1991). Bergman, P. and Moore, A. “Mistrial by Likelihood Ratio: Bayesian Analysis Meets the F-Word.” Cardozo Law Review 13: 589–619.

(Berkson, 1938). Berkson, J. “Some Difficulties of Interpretation Encountered in the Application of the Chi-Squared Test.” Journal of the American Statistical Association 33: 526–536.

(Button, et al., 2013). Button, K.S. et al. “Power Failure: Why Small Sample Size Undermines the Reliability of Neuroscience.” Nature Reviews/Neuroscience 14: 365–376.

(Carney et al., 2010). Carney, D.R. et al., “Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance.” Psychological Science 21: 1363–1368.

(Carver, 1978). Carver, R. “The Case Against Statistical Significance Testing.” Harvard Educational Review 48: 378–399.

(Cheng, 2009). Cheng, E.K. “A Practical Solution to the Reference Class Problem.” Columbia Law Review 109: 2081–2105.

(Chow, 1988). Chow, S.L. “Significance Test or Effect Size?” Psychological Bulletin 103: 105–110.

(Chow, 1996). Chow, S.L. Statistical Significance: Rationale, Validity and Utility. London: Sage.

(Chow, 1998). Chow, S.L. “Precis of Statistical Significance: Rationale, Validity and Utility.” Behavioral and Brain Sciences 21: 169–239.

(Cohen, 1990). Cohen, J. “Things I Have Learned (So Far).” American Psychologist 45: 1304–1312.

(Cohen, 1994). Cohen, J. “The Earth is Round (p < 0.05).” American Psychologist 49: 997–1003.

(Colman, 2003). Colman, A.M. “Cooperation, Psychological Game Theory, and Limitations of Rationality in Social interaction.” Behavioral and Brain Sciences 26: 139–198.

(Colyvan et al., 2001). Colyvan, M. et al. “Is It a Crime to Belong to a Reference Class?” Journal of Political Philosophy 9: 168–181.

(Colyvan & Regan, 2007). Colyvan, M. & Regan, H.M. “Legal Decisions and the Reference Class Problem.” International Journal of Evidence and Proof 11: 274–285.

(Cumming, 2012). Cumming, G. Understanding the New Statistics: Effect Sizes, Confidence Intervals, and Meta-Analysis. London: Routledge.

(Cumming, 2014). Cumming, G. “The New Statistics: Why and How.” Psychological Science 25: 7–29.

(de Long & Lang, 1992). de Long, J.B. and Lang, K. “Are All Economic Hypotheses False?” Journal of Political Economy 100: 1257–1272.

(Diekmann, 2011). Diekmann, A. “Are Most Published Research Findings False?” Journal of Economics and Statistics 231: 628–635.

(Dienes, 2011). Dienes, Z. “Bayesian Versus Orthodox Statistics; Which Side Are You On?” Perspectives on Psychological Science 6: 274–290.

(Earman, 1989). Earman, J. “Old Evidence, New Theories: Two Unresolved Problems in Bayesian Confirmation Theory.” Pacific Philosophical Quarterly 70: 323–340.

(Eells, 1990). Eells, E. Bayesian Problems of Old Evidence in Scientific Theories Minneapolis MN: Univ. of Minnesota Press.

(Eggleston, 1991). Eggleston, R. “Similar Facts and Bayes’ Theorem.” Jurimetrics Journal 31: 275–287.

(Everett & Earp, 2015). Everett, J.A.C. & Earp, B.D. “A Tragedy of the (Academic) Commons: Interpreting the Replication Crisis in Psychology as a Social Dilemma for Early-Career Researchers.” Frontiers in Psychology 6. 5 August. Available online at URL = <https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2015.01152/full>.

(Fidler et al., 2004). Fidler, F. et al. “Editors Can Lead Researchers to Confidence Intervals, But They Can’t Make Them Think: Statistical Reform Lessons from Medicine.” Psychological Science 15: 119–126.

(Fienberg et al., 1995). Fienberg, S.E. et.al. “Understanding and Evaluating Statistical Evidence in Litigation.” Jurimetrics Journal 36: 1–32.

(Fisher, 1960). Fisher, R.A. The Design of Experiments 7th edn., New York: Hafner.

(Freedman et al., 2015). Freedman, L.P. et al. “The Economics of Reproducibility in Preclinical Research.” PLOS Biology 13. 9 June. Available online at URL = <https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165>.

(Freeman, 2010). Freedman, D.H. Wrong. New York: Little Brown and Company.

(Frick, 1996). Frick, R.W. “The Appropriate Use of Null Hypothesis Testing.” Psychological Methods 1: 379–390.

(Garber, 1983). Garber, D. “Old Evidence and Logical Omniscience in Bayesian Confirmation Theory.” In J. Earman (ed.), Testing Scientific Theories. Minneapolis MN: Univ. of Minnesota Press. Pp. 99–132.

(Gelman & Stern, 2006). Gelman, A. & Stern, H. “The Difference Between ‘Significant’ and ‘Not Significant’ is Not Itself Statistically Significant.” American Statistician 60: 328–331.

(Gelman, 2014). Gelman, A. “The Fallacy of Placing Confidence in Confidence Intervals.” Available online at URL = <http://andrewgelman.com/2014/12/11/fallacy-placing-confidence-confidence-intervals/.>

(Gigerenzer, 1998). Gigerenzer, G. “We Need Statistical Thinking, Not Statistical Rituals.” Behavioral and Brain Sciences 21: 199–200.

(Gigerenzer, 2004). Gigerenzer, G. “Mindless Statistics.” Journal of Socio-Economics 33: 587–606.

(Gigerenzer, 2018). Gigerenzer, G. “Statistical Rituals: The Replication Delusion and How We Got There.” Advances in Methods and Practices in Psychological Science 1, 2: 198–218.

(Glass et al., 1981). Glass, G.V. et al. Meta-Analysis in Social Research. Beverly Hills CA: Sage Publications.

(Gliner et al., 2002). Gliner, J. et al. “Problems with Null Hypothesis Significance Testing (NHST): What Do the Textbooks Say?” Journal of Experimental Education 71: 83–92.

(Glymour, 1980). Glymour, C. Theory and Evidence. Princeton NJ:Princeton Univ. Press.

(Godlee et al., 1998). Godlee, F. et al. “Effect on the Quality of Peer Review of Blinding Reviewers and Asking Them to Sign Their Reports: A Randomized Control Trial.” JAMA 280: 237–240.

(Goodman, 1993). Goodman, S.N. “P Values Hypothesis Tests, and Likelihood: Implications for Epidemiology of a Neglected Historical Debate.” American Journal of Epidemiology 137: 485–496.

(Grant, 1962). Grant, D.A. “Testing the Null Hypothesis and the Strategy and Tactics of Investigating Theoretical Models.” Psychological Review 69: 54–61.

(Greenland, 2011). Greenland, S. “Null Misinterpretation in Statistical Testing and Its Impact on Health Risk Assessment.” Preventive Medicine 53: 225–228.

(Gunn et al., 2016). Gunn, L.J. et al. “Too Good to be True: When Overwhelming Evidence Fails to Convince.” Proceedings of the Royal Society A 472. 23 March. Available online at URL = <https://royalsocietypublishing.org/doi/10.1098/rspa.2015.0748>.

(Gunter& Tong, 2016–2017). Gunter, B. & Tong, C. “A Response to the ASA Statement on Statistical Significance and P-Values.” NV-ASA Newsletter 14: 1–3.

(Guttman, 1985). Guttman, L. “The Illogic of Statistical Inference for Cumulative Science.” Applied Stochastic Models and Data Analysis 1: 3–10.

(Hagan, 1997). Hagan, R.L. “In Praise of the Null Hypothesis Statistical Test.” American Psychologist 52: 15–24.

(Hájek, 1997). Hájek, A. “Mises Redux-Redux: Fifteen Arguments against Finite Frequentism.” Erkenntnis 45: 209–227.

(Hájek, 2003). Hájek, A. “What Conditional Probability Could Not Be.” Synthese 137: 273–323.

(Hájek, 2007). Hájek, A. “The Reference Class Problem is Your Problem Too.” Synthese 156: 563–585.

(Hanna, 2023a). Hanna, R. “Empirical Science with Uncertainty but Without Reproducibility.” Against Professional Philosophy. December 10. Available online at URL = <https://againstprofphil.org/2023/12/10/empirical-science-with-uncertainty-but-without-reproducibility/>.

(Hanna, 2023b). Hanna, R. “The End of Peer Review and the Matrix of Ideas.” Against Professional Philosophy. Available online at URL = <https://againstprofphil.org/2023/12/17/the-end-of-peer-review-and-the-matrix-of-ideas/>.

(Harlow, et al., eds, 1997). Harlow, L.L. et al. eds. What If There Were No Significance Tests? Mahwah NJ: Lawrence Erlbaum.

(Harman, 1965). Harman, G. “Inference to the Best Explanation.” Philosophical Review 74: 88–95.

(Harris, 1997). Harris, R. J. “Significance Tests Have Their Place.” Psychological Science 8: 8–11.

(Hartshorne et al., 2012). Hartshorne, J. et al. “Tracking Replicability as a Method of Post-Publication Open Evaluation.” Frontiers in Computational Neuroscience 6: 1–13.

(Higginson & Munafò, 2016). Higginson, A.D. and Munafò, M.R. “Current Incentives for Scientists Lead to Underpowered Studies with Erroneous Conclusions.” PLOS Biology 14. 10 November. Available online at URL = <https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.2000995>.

(Hild, 2000). Hild, M. “Trends in the Philosophy of Probability.” Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics 31: 419–422.

(Hill, 1965). Hill, A.B. “The Environment and Disease: Association or Causation?” Proceedings of the Royal Society of Medicine 58: 295–300.

(Hodgson, 1995). Hodgson, D. “Probability: The Logic of the Law — A Response.” Oxford Journal of Legal Studies 15: 51–68.

(Holman et al., 2001). Holman, C.L. et al. “A Psychometric Experiment in Causal Inference to Estimate Evidential Weights used by Epidemiologists.” Epidemiology 12: 246–255.

(Homburg, 1987). Humburg, J. “The Bayes Rule is Not Sufficient to Justify or Describe Inductive Reasoning.” Erkenntnis 26: 379–390.

(Horton, 2015). Horton, R. “Offline: What is Medicine’s 5 Sigma?” The Lancet 385: 1380.

(Howson, 1991). Howson, C. “The Old Evidence Problem.” British Journal for the Philosophy of Science 42: 547–555.

(Howson & Urbach, 2006). Howson, C. & Urbach, P. Scientific Reasoning: The Bayesian Approach. La Salle IL:Open Court.

(Hubbard & Bayarri, 2003). Hubbard, R. & Bayarri, M. J. “Confusion Over Measures of Evidence (P’s) Versus Errors (α’s) in Classical Statistical Testing (With Comments).” American Statistician 57: 171–182.

(Hubbard et al., 2019). Hubbard, R. et al. “The Limited Role of Formal Statistical Inference in Scientific Inference.” American Statistician 73 (S1): 91–98.

(Humphreys, 1988). Humphreys, P. “Non-Nietzschean Decision Making.” In J.H. Fetzer (ed.), Probability and Causality. Dordrecht: Kluwer. Pp. 253–268.

(Hunter, 1997). Hunter, J.E. “Needed: A Ban on the Significance Test.” Psychological Science (Special Section) 8: 3–7.

(Hurlbert & Lombardi, 2009). Hurlbert, S. H. & Lombardi, C. M. “Final Collapse of the Neyman-Pearson Decision Theoretic Framework and Rise of the NeoFisherian.” Annales Zoologica Fennici 46: 311–349.

(Hurlbert, et al., 2019). Hurlbert, S. H. et al. “Coup de Grâce for a Tough Old Bull: ‘Statistically Significant’ Expires.” American Statistician 73 (S1): 352–357.

(Hyliand & Zeckhauser, 1979). Hyliand, A. and Zeckhauser, R. “The Impossibility of Bayesian Group Decision Making with Separate Aggregation of Beliefs and Values.” Econometrica 47: 1321–1336.

(Ioannidis & Trikalinos, 2005). Ioannidis, J.P.A. & Trikalinos, T. A. “Early Extreme Contradictory Estimates May Appear in Published Research: The Proteus Phenomenon in Molecular Genetics Research and Randomized Trials.” Journal of Clinical Epidemiology 58: 543–549.

(Ioannidis, 2005a). Ioannidis, J.P.A. “Why Most Published Research Findings are False.” PLOS Medicine 2, 8: 0696–0701.

(Ioannidis, 2005b). Ioannidis, J.P.A. “Contradictions in Highly Cited Research — Reply.” JAMA 294: 2695–2696.

(Ioannidis, 2008). Ioannidis, J.P.A. “Why Most Discovered True Associations are Inflated.” Epidemiology 19: 640–648.

(Ioannidis, 2014). Ioannidis, J.P.A. et al. “Publication and Other Reporting Biases in Cognitive Sciences: Detection, Prevalence, and Prevention.” Trends in Cognitive Sciences 18, 5: 235–241.

(Jefferson et al., 2002). Jefferson, T. et al. “Effects of Editorial Peer Review: A Systematic Review.” JAMA 287: 2784–2786.

(Johnson, 1999). Johnson, D.H. “The Insignificance of Statistical Significance Testing.” Journal of Wildlife Management 63: 763–772.

(Johnson, 2005). Johnson, D.H. “What Hypothesis Tests are Not: A Response to Colgrave and Ruxton.” Behavioral Ecology 16: 323–324.

(Kaplan, 1989). Kaplan, M. “Bayesianism without the Black Box.” Philosophy of Science 56: 48–69.

(Kaye, 1986). Kaye, D.H. “Is Proof of Statistical Significance Relevant?” Washington Law Review 61: 1333–1365.

(Kaye, 2004). Kaye, D. H. “Logical Relevance: Problems with the Reference Population and DNA Mixtures in People v Pizarro.” Law, Probability and Risk 3: 211–220.

(Kelly & Glymour, 2004). Kelly, K.T. & Glymour, C. “Why Probability Does Not Capture the Logic of Scientific Justification.” In C. Hitchcock (ed.), Contemporary Debates in Philosophy of Science. Malden MA: Blackwell. Pp. 94–114.

(Kelly & Schulte, 1995). Kelly, K. and Schulte, O. “The Computable Testability of Theories with Uncomputable Predictions.” Erkenntnis 42: 29–66.

(Kirk, 1996). Kirk, R.F. “Practical Significance: A Concept Whose Time Has Come.” Educational and Psychological Measurement 56: 746–759.

(Kyburg, 1978). Kyburg, H. “Subjective Probability: Criticisms, Reflections and Problems.” Journal of Philosophical Logic 7: 157–180.

(Kyburg, 1993). Kyburg, H. “The Scope of Bayesian Reasoning,” in D. Hull et al. (eds.), Philosophy of Science Association 1992. East Lansing MI: Philosophy of Science Association. Pp. 139–152.

(Ligertwood, 1996–1997). Ligertwood, A. “Bayesians and the World Out There.” International Journal of Evidence and Proof 1: 321–325.

(Lindley, 1957). Lindley, D.V. “A Statistical Paradox.” Biometrika 44: 187–192.

(Loftus, 1991). Loftus, G.R. “On the Tyranny of Hypothesis Testing in the Social Sciences.” Contemporary Psychology 36: 102–105.

(Lykken, 1968). Lykken, D.T. “Statistical Significance in Psychological Research.” Psychological Bulletin 70: 151–159.

(Martin, 1992). Martin, B. “Scientific Fraud and the Power Structure of Science.” Prometheus 10: 83–98.

(McCloskey, 1986). McCloskey, D. “Why Economic Historians Should Stop Relying on Statistical Tests of Significance and Lead Economists and Historians into the Promised Land.” Newsletter of the Cliometrics Society 2: 5–7.

(McShane, 2019). McShane, B.B. “Abandon Statistical Significance.” American Statistician 73 (S1): 235–245.

(Merton, 1968). Merton, R.K. “The Matthew Effect in Science.” Science 159: 56–63.

(Milne, 1991). Milne, P. “Annabel and the Bookmaker: An Everyday Tale of Bayesian Folk.” Australasian Journal of Philosophy 69: 98–102.

(Moonesinghe et al., 2007). Moonesinghe, R. et al. “Most Published Research Findings are False — But a Little Replication Goes a Long Way.” PLoS Medicine 4, 2: 0218–0221.

(Morey et al., 2014). Morey, R.D. et al. “Why Hypothesis Tests are Essential for Psychological Science: A Comment on Cumming.” Psychological Science 25: 1289–1290.

(Morey et al., 2016). Morey, R.D. et al. “The Fallacy of Placing Confidence in Confidence Intervals.” Psychological Bulletin and Review 23: 103–123.

(Morgan, 2003). Morgan, P.L. “Null Hypothesis Significance Testing: Philosophical and Practical Considerations of a Statistical Controversy.” Exceptionality 11: 209–221.

(Morrison & Henkel eds., 1970). Morrison, D.E. and Henkel, R.E. (eds.), The Significance Test Controversy. Chicago: Aldine.

(Nickerson, 2000). Nickerson, R.S. “Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy.” Psychological Methods 5: 241–305.

(Norton, 2011). Norton, J. D. “Challenges to Bayesian Confirmation Theory.” In P.S. Bandyopadhyay and M. Forster (eds.), Philosophy of Statistics: Vol. 7, Handbook of the Philosophy of Science. Amsterdam: North Holland. Pp. 391–437.

(Nunnally, 1960). Nunnally, J. “The Place of Statistics in Psychology.” Educational and Psychological Measurement 20: 641–650.

(Oakes, 1986). Oakes, M. Statistical Inference: A Commentary for the Social and Behavioral Sciences. New York: Wiley.

(Open Science Collaboration, 2015). Open Science Collaboration. “Estimating the Reproducibility of Psychological Science.” Science 349: aac4716–1 — aac4716–8.

(Pardo, 2000). Pardo, M.S. “Juridical Proof, Evidence and Pragmatic Meaning: Toward Evidentiary Holism.” Northwestern University Law Review 95: 399–442.

(Pardo, 2007). Pardo, M.S. “Reference Classes and Legal Evidence.” International Journal of Evidence and Proof 11: 255–258.

(Peters & Ceci, 1982). Peters, D. & Ceci, S. “Peer-Review Practices of Psychological Journals: The Fate of Submitted Articles, Submitted Again.” Behavioral and Brain Sciences 5: 187–255.

(Pratt, 1987). Pratt, J.W. “Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence: Comment.” Journal of the American Statistical Association 82: 123–125.

(Rawling, 1999). Rawling, P. “Reasonable Doubt and the Presumption of Innocence: The Case of the Bayesian Juror.” Topoi 18: 117–126.

(Redmayne, 2003). Redmayne, M. “Objective Probability and the Assessment of Evidence.” Law, Probability and Risk 2: 275–294.

(Redmayne, 2008). Redmayne, M. “Exploring the Proof Paradoxes.” Legal Theory 14: 281–309.

(Reichenbach, 1949). Reichenbach, H. The Theory of Probability. Berkeley CA: Univ. of California Press.

(Rhee, 2007). Rhee, R.J. “Probability, Policy and the Problem of the Reference Class.” International Journal of Evidence and Proof 11: 286–291.

(Roberts, 2007). Roberts, P. “From Theory into Practice: Introducing the Reference Class Problem.” International Journal of Evidence and Proof 11: 243–254.

(Rosenthal & Rubin, 1985). Rosenthal, R. and Rubin, D.B. “Statistical Analysis: Summarizing Evidence Versus Establishing Facts.” Psychological Bulletin 97: 527–529.

(Rosnow & Rosenthal, 1989). Rosnow, R.L. & Rosenthal, R. “Statistical Procedures and the Justification of Knowledge in Psychological Science.” American Psychologist 44: 1267–1284.

(Rozeboom, 1960). Rozeboom, W.W. “The Fallacy of the Null-Hypothesis Significance Test.” Psychological Bulletin 57: 416–428.

(Schmidt & Hunter, 2002), Schmidt, F.L. and J.E. Hunter, J.E. “Are There Benefits from NHST?” American Psychologist 57: 65–66.

(Schmidt, 1991). Schmidt, F.L. “Statistical Significance Testing and Cumulative Knowledge in Psychology: Implications for Training of Researchers.” Psychological Methods 1: 115–129.

(Selvin, 1957). Selvin, H.C. “A Critique of Tests of Significance in Survey Research.” American Sociological Review 22: 519–527.

(Shackel, 2007). Shackel, N. “Bertrand’s Paradox and the Principle of Indifference.” Philosophy of Science 74: 150–175.

(Shadish & Cook, 1999). Shadish, W. R. & Cook, T. D. “Comment — Design Rules: More Steps Towards a Complete Theory of Quasi-Experimentation.” Statistical Science 14: 294–300.

(Shafer, 1986). Shafer, G. “The Construction of Probability Arguments.” Boston University Law Review 66: 799–816.

(Shrader-Frechette, 2008). Shrader-Frechette, K. “Statistical Significance in Biology: Neither Necessary Nor Sufficient for Hypothesis Acceptance.” Biological Theory 3: 12–16.

(Shrout, 1997). Shrout, P. “Should Significance Tests Be Banned?” Psychological Science 8: 1–2.

(Simberloff, 1990). Simberloff, D. “Hypotheses, Errors and Statistical Assumptions.” Herpetologica 46: 351–357.

(Simmons, 2011). Simmons, J.P. “False Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” Psychological Science 22: 1359–1366.

(Simpson & Orlov, 1979–1980). Simpson, S. and Orlov, M. “Comment: An Application of Logic to the Law.” University of New South Wales Law Journal 3: 415–425.

(Smith, 2023). Smith, J. W. “Against the Academics: Peering at the Problem of Peer Review.” Against Professional Philosophy. June 18. Available online HERE.

(Smith, 1999). Smith, J.W. et al. The Bankruptcy of Economics. London: Macmillan.

(Smith & Smith, 2023a). Smith, J.W. & Smith, S. “Corruption, Falsity, and Fraud: The Epistemological Crisis of Professional Academic Research.” Against Professional Philosophy. 17 September. Available online HERE.

(Smith & Smith, 2023b). Smith, J. W. & Smith, S. “From Scientific Reproducibility to Epistemic Humility.” Against Professional Philosophy. 3 December. Available online at URL = <https://againstprofphil.org/2023/12/03/from-scientific-reproducibility-to-epistemic-humility/>.

(Smith, 2006). Smith, R. “Peer Review: A Flawed Process at the Heart of Science and Journals.” Journal of the Royal Society of Medicine 99: 178–182.

(Sowden, 1984). Sowden, L. “The Inadequacy of Bayesian Decision Theory,” Philosophical Studies 45: 293–313.

(Stein, 1996–1997). Stein, A. “Judicial Fact-Finding and the Bayesian Method: The Case for Deeper Scepticism About Their Combination.” International Journal of Evidence and Proof 1: 25–47.

(Sterne & Smith, 2001). Sterne, J.A.C. and Smith, G.D. “Sifting the Evidence — What’s Wrong with Significance Tests? British Medical Journal 322: 226–231.

(Suppes, 2007). Suppes, P. “Where do Bayesian Priors Come From?” Synthese 156: 441–471.

(Tabarrok, 2005). Tabarrok, A. “Why Most Published Research Findings are False.” Marginal Revolution. 2 September. Available online at URL = <http://marginalrevolution.com/marginalrevolution/2005/09/why_most_publis.html.>.

United States v Shonubi, 802 F Supp 859 (EDNY, 1992).

United States v Shonubi, 998 F2d 84 (2d Cir., 1993).

United States v Shonubi, 895 F Supp 480 (EDNY, 1995).

United States v Shonubi, 962 F. Supp 370 (EDNY, 1997).

United States v Shonubi, 103 F3d 1085 (2d Cir, 1997).

(Van Fraassen, 1988). Van Fraassen, B.C. “The Problem of Old Evidence.” In D.F. Austin (ed.), Philosophical Analysis. Norwell: Kluwer. Pp. 153–165.

(Venn, 1876). Venn, J. The Logic of Chance. 2nd edn., London: Macmillan.

(Vogel, 2011). Vogel, G. “Scientific Misconduct: Psychologist Accused of Fraud on ‘Astonishing Scale’.” Science 334: 579.

(Wagner, 1997). Wagner, C.G. “Old Evidence and New Explanation.” Philosophy of Science 64: 677- 691.

(Wasserstein & Lazar, 2016). Wasserstein, R. L. & Lazar, N.A. “The ASA Statement on P-Values: Context, Process, and Purpose.” American Statistician 70: 129–133.

(Williamson, 2007). Williamson, T. “How Probable is an Infinite Sequence of Heads?” Analysis 67: 173–180.

(Yong, 2015). Yong, E. “How Reliable Are Psychology Studies?” The Atlantic. 27 August. Available online at URL = <http://www.theatlantic.com/science/archive/2015/08/psychology-studies-reliability-reproducibility-nosek/4024661>.

(Ziliak & McCloskey, 2008). Ziliak, S.T. & McCloskey, D.N. The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice and Lives. Ann Arbor MI: Univ. of Michigan Press.

(Zynda, 1995). Zynda, L. “Old Evidence and New Theories.” Philosophical Studies 77: 67–95.

Download