Response to Susan Oliver video “Antivaxxers fooled by p-hacking and apples to oranges comparison”
The video and the tweet publicising it
On 26 June 2022 Susan Oliver published a video on YouTube titled “Antivaxxers fooled by p-hacking and apples to oranges comparison” in response to a preprint  by 8 authors, one of whom was well-know BMJ Senior Editor Peter Doshi. Susan refers to the paper as the “Doshi paper” and we will use the same reference here even though Doshi is the last, rather than first, named author. The paper demonstrates the increased risk of serious adverse events (SAEs) arising from the Pfizer and Moderna covid vaccine trials. Susan summarised her view of the paper in this tweet (which included the link to the video) that was retweeted by people like Prof Sir David Spiegelhalter (a world renowned expert on probability and risk) and Prof Peter Hansen (Econometrician, Data Scientist, and Latene Distinguished Professor of Economics at UNC, Chapel Hill):
What Susan says in the video and why it totally misrepresents the Doshi paper
Susan spends 3 minutes highlighting a number of people she refers to as “anti-vaxxers” who tweeted about the paper, including Jordan Peterson who she refers to as a "self-declared best-selling author" (note: his 2018 book sold over 3 million copies and was number 1 on Amazon). Susan then states:
“It’s basically just a rubbish paper that uses a technique known as p-hacking followed by some apples stuff oranges comparisons”.
Interestingly, despite the video title, Susan spends less than 30 seconds describing what p-hacking is and instead refers to a paper about it  (we agree entirely with the general concerns raised about p-hacking and show how it is avoided using Bayesian hypothesis testing ). But the key flaw in Susan's criticism is that the “Doshi paper” is not an example of p-hacking at all. They do not use p-values and, also contrary to the continued assertions of Susan, they make no claims at all of statistical significance. Rather, the paper provides risk differences and risk ratios with 95% confidence intervals (CIs) for the various different comparisons of vaccine v placebo. For example, here is their table of results for all serious adverse events (SAEs) and also of the subset of serious adverse events of special interest (serious AESIs):
If the authors had been “p-hacking” they would have chosen a p-value like 0.05 and would have added, for each comparison of vaccine v placebo, a ‘significance statistic’ and arrived at at least one example where the statistic was less than 0.05. Then they would claim, for example, that the increased SAE rate was ‘significant’. They do nothing like that at all.
Susan then claims that only by ‘combining’ the data from the different trials does Doshi get the (mythically claimed) ‘significant results’ and that such combining should simply not be done (this is one of her ‘apples and oranges’ comparison argument). But, while it is true that the paper does also look at the combined numbers for each class of SAE, it turns out that in each case, the risk ratio for the combined numbers is actually less than for the Pfizer trial alone. For example, for all SAEs the (median) risk ratio for Pfizer v placebo is 1.36 compared to just 1.15 for combined v placebo: the results are less, not more, ‘significant’. Our own Bayesian analysis of the results presented below makes this very clear.
Susan’s final criticisms of the Doshi paper concerns the selection of SAEs and the possibility of ‘double counting’. Regarding selection, the events included and not included are governed by the WHO endorsed Brighton scheme, and are not decided by the authors, so this is a critical error Susan makes. The Brighton list was created a priori, based on data before the any results were released from the trials. Any double counting, such as with the diarrhoea and abdominal pain example she uses, are a direct effect of the fact that the data are not public. There’s merit to both measures - counting number of participants (with any SAE) and number of events. If one person has two SAEs that is worse than one person having one SAE. “Double counting” sounds bad, but this is not double counting. Doshi et al are measuring how many SAEs occur in the vaccine group versus the placebo group. If Diarrhoea and abdominal pain were each recorded as a SAE, then that is two SAEs. We don’t know which ones were in the same person as Pfizer/Moderna have not released IPD. In any case, the authors recognise the issue that, because some SAEs occur in the same person, the SAEs are not all independent events; they note it in the paper, and introduce an adjustment to standard error to account for it. It is unclear whether the adjustment is sufficient, but it actually weakens their case (it increases the size of the confidence intervals) - so they can hardly be accused of bias.
Further regarding double counting, SAEs are counted individually to avoid them being hidden. So, if you get renal failure and then your penis drops off that should be two SAEs, not one. One person having three SAEs (renal failure, penis drops off, stroke) could be considered as serious as three people having a stroke; so, although some clinicians disagree, it is entirely reasonable to count SAEs separately. But Susan does not appear to understand what a SAE is. She assumes something like diarrhoea cannot be a SAE because lots of diarrhoea happens to be mild. But most covid is not serious, either. So diarrhoea can be a SAE if it’s serious enough and meets the regulatory criteria. And it’s a leading cause of death in some places.
That addresses all the ‘flaws’ that Susan claims about the paper. It is also important to note that, even when all the SAEs in the Pfizer and Moderna trial are combined, the absolute risk increase is fairly small - a fact already made clear by Doshi et al. (although this is to be balanced against the very low risks of severe covid, which is in essence the core message of the paper). They state that, in this case, the absolute risk increase (95% CI) is between 2.1 to 22.9 events per 10,000 participants. In our Bayesian analysis the median absolute risk increase is 12.9 events per 10,000 participants with CI between 0 to 27.