The last weeks have seen a pretty hot academic debate. About the proper use of p-values. Really, can you have a hot debate about p-values?!? Yes, you can! I regularly read Andrew Gelman's blog, and three posts on that topic (
this,
this, and
this) that appeared in March and April 2019 attracted more than 800 (!) comments. And by comment, I mean real text, often quite long, and not just a thumbs up or thumbs down. So, apparently, the topic does attract a lot of attention, and it seems almost impossible to keep track of all these comments and contributions.
A brief and incomplete timeline
So, in case you have not heard about it, what is going on here? Clearly, the topic of how to make use of p-values in a proper way is not new, see
here for an example from 1994. In 2015, the debate received renewed attention after the journal "Basic and Applied Social Psychology" banned the use of p-values (or t-values, confidence intervals, and the like). They explained their reasoning in an
editorial, and it seems this editorial is by far the most frequently read article in this journal.
Then, in 2016, the American Statistical Association published a
statement on p-values. It's not quite a "product recall", but it's close, maybe a product safety alert. In pretty clear word, the statements cautions against the typical use of p-values. Then, in March 2019, the journal "American Statistician" published a special issue "
Statistical Inference in the 21st Century: A World Beyond p < 0.05", with dozens of articles dealing with this topic. I have read about a third of these articles by now, and I can highly recommend taking the time and reading them!
Parallel to the publication of this special issue, a group of researchers (Valentin Amrhein, Sander Greenland, Blake McShane) wrote a short piece to be published as a commentary in "Nature". The main point of the article was the pledge to "retire statistical significance". The authors invited researcher worldwide to sign this "petition" if they agreed. I do agree with most of what they wrote, so I signed, along with more than 800 other researchers. Nature then published this piece under the slightly attention-grabbing headline of "
Scientists rise up against statistical significance". And, again, this created a lot of attention. As far as I know, this article is the one with the highest Altmetric-score of all articles that have been tracked so far. As I write this, it has a score of
12795. Not bad. For a piece about p-values.
Why does this topic attract so much attention and discussion?
There are probably many reasons, but I want to mention three. (1) Statistical inference matters. Millions of researchers around the globe collect and analyze data, and their goal should be to draw valid conclusions. Whether we are using the right tools to do that is important. (2) Most empirical researchers so far have relied on p-values in the past, this is the dominating paradigm. So when this is knocked off its pedestal, it concerns many researchers. (3) (Applied) statistician are supposed to be able to make sense of numbers, right? At least that is, I would argue, the laymen's perspective. Statisticians apply their tools to extract the truth from the data, don't they? And analyzing data will bring certainty where uncertainty would prevail otherwise. But apparently, there seems to be a lot of uncertainty as to how to properly analyze the data, or at least, there is a lot of uncertainty on how to interpret the uncertainty in the data.
Why should we retire statistical significance?
Clearly, the comment by Amrhein et al.
does not advocate to abandon statistical analysis. Clearly, it does not call for ignoring the uncertainty associated with an estimate. To me, the most important part of the comment is the point about
not dichotomizing the evidence. An estimate that has a p-value of .04 is not fundamentally or qualitatively different from an estimate with a p-value of .06. Concluding that an estimate with p<.04 "has an effect" while the estimate with p>.06 has "has no effect" is wrong. As Gelman has written a zillion times, the
difference between significant and insignificant is not significant. And Amrhein's comment and the surrounding publicity puts a spotlight on this debate. There is also valid and relevant criticism that people bring forward against their comments, but I will save that for another day.