# The curious perfect p-value: a case study in defamation and ignorance

Updated: Jun 25, 2022

*UPDATE 6 June 2022: Sheldrick's defamatory article has now been removed from his blog. My understanding is that this is as a result of legal action against him. But note that the *__Times is still citing Sheldrick__* as if he is credible source of Covid information.*

**1. The accusation**

__Kyle Sheldrick__ is making a name for himself as someone determined to expose those who he claims are guilty of spreading Covid ‘misinformation’. He has a particular obsession for going after people who promote real world studies of early effective Covid treatment. One such person is __Paul Marik____,__ a highly respected doctor with 30 years’ experience including pharmacology, anesthesiology, and critical care and many hundreds of highly cited peer-reviewed articles. Not content with trying to discredit the Covid work of people like Marik, on 22 March Sheldrick __wrote a blog article in which he accused Marik and his co-authors of fraud__ relating to __a 2017 study about vitamin C treatment for sepsis published in the CHEST Journal__.

The basis for his potentially defamatory claims was that Marik’s study used data which Sheldrick said must have been fraudulent because the patients in the control group and treatment group were ‘too well matched’ for it to be by coincidence.

Before analysing Sheldrick's claim it is important to note that Marik's study began as an observational study where the patient outcomes were good. In order to give the study more substance, the nurses went back in the same hospital patient data and pulled those that met the same criteria as those observed. This was a retrospective pairing and it was not meant to be random. But even ignoring this, Sheldrick's claim of fraud is wrong.

The problem is that, to make his conclusion, Sheldrick used a statistical test which he clearly did not understand, and which was in any case totally inappropriate for his (ill-defined) hypothesis of fraud.

Online researchers __here__ and __here__ have provided comprehensive explanations of the many reasons why Sheldrick’s argument is badly flawed. But missing so far has been an explanation of exactly what the statistic used by Sheldrick is and how it is computed. Once we show what it is, it becomes evident just how ludicrous the fraud claims are, even ignoring the fact that the control group patients were selected to be well matched.

**2. Sheldrick's evidence**

Sheldrick presents his ‘evidence’ in the form of this table:

The rows are the various attributes (personal or medical conditions) of the patients. There were 47 patients in the treatment group and 47 in the control group. The first (resp. second) column is the number of patients in the treatment (resp. control) group with the attribute, while the third (resp. fourth) column is the number of patients in the treatment (resp. control) group without the attribute. So columns 1 and 3 sum to 47 and columns 2 and 4 sum to 47.

Sheldrick’s hypothesis is that the control group and treatment group are too similarly matched in too many attributes for this to happen by chance (there are, for example, 6 of the 24 attributes where the numbers with the attribute are equal in both groups). He claims that the statistical evidence for this are the values in the last column. These values are the results of a particular statistical ‘significance’ test - the “p Value From Fisher Exact test” – applied to the first 4 column values. He claims that what you should be seeing here are values which average to 0.5 if there was no deliberate attempt to make the numbers in each group similar. The fact that so many numbers are equal to 1 and most of the others are above 0.5 is – according to Sheldrick - proof of fraud. But he is wrong, even if we ignore the various legitimate reasons (well covered in the online criticisms) why there would inevitably be similarities.

**3. So what is the "p Value From Fisher Exact test"?**

Those that know me know that, as a Bayesian, I regard any p-values and classical statistical tests of significance as arbitrary, overly complex and totally unnecessary (see Appendix below); many people who use them have no idea what they mean. But, since this is what Sheldrick is using, let's see exactly what the p value statistic in the last column of his table is. Sheldrick assumes that everybody knows what it is and how it is calculated. He does not provide a definition and, as this tweet shows, he does not know or understand it (it is NOT based on the chi squared distribution):

Instead, since he does not define or understand it, we can assume Sheldrick uses a pre-defined function (possibly in the R programming language or similar since this gives the same results to Sheldrick's) to compute it. In fact, there does not seem to be a ‘standard’ definition for this statistic and there are indeed online calculators like __this__ that give completely different values. For the general case it is quite a __complex definition and calculation____.__ However, when the total number of people in the control and treatment group are the same (which they are here, 47 in each) the definition and calculation of the test (as defined by the R function) is much simpler. So, I will stick with the definition and calculation for this simpler case because it allows us to show exactly how the numbers in Sheldrick’s final column are calculated and why they don't mean what Sheldrick thinks they mean.