PACE trial's extensive, major, post-trial revisions, without adequate justification would be game fixing in the world of sport ...
By Alem Matthees, bmj.com
AllTrials supporters may be interested in the multiple major deviations/additions to the PACE Trial protocol, apparently occurring almost exclusively after the authors were already unblinded to the trial data and familiar with the distribution of various outcomes. This latest paper on mediators by Chalder et al. appears to continue this tradition.
The protocol was published in BioMed Central on the basis that "the authors/investigators are unlikely to be able to make revisions to their protocol". The editor(s) "strongly advise readers to contact the authors or compare with any published results article(s) to ensure that no deviations from the protocol occurred during the study." BioMed Central "believes that publishing study protocols will help to improve the standard of medical research by: [...] enabling readers to compare what was originally intended with what was actually done, thus preventing both 'data dredging' and post-hoc revisions of study aims." It is therefore concerning that the protocol underwent extensive, major, post-trial revisions, without adequate justification. All changes substantially decreased the stringency of the thresholds, made the tested therapies appear much more effective or less harmful than they otherwise would have, and lead to widespread media hype.
The primary endpoint was completely abandoned after the trial ended. For fatigue this had been either 50% improvement or a Chalder Fatigue Questionnaire (CFQ) bimodal score of ≤3/11 points. For physical function this had been either 50% improvement or a Short-Form-36 physical function (SF-36/PF) score of ≥75/100 points. The "clinically useful difference" for individual participants (≥2/33 points CFQ Likert score and/or ≥8/100 points SF-36/PF) was introduced post-hoc and was significantly less stringent than the "positive outcome"(s) as previously defined.
The recovery criteria (a secondary analysis) underwent extensive, major, post-hoc changes, which made it much less stringent to the point of being highly doubtful whether anyone genuinely recovered. It became possible to be classified as completely "recovered" without clinically significant improvements to either fatigue or physical function. None of these changes, described below, were included in the statistical analysis plan that was finalized shortly before data unblinding.
1) The previously required CFQ bimodal score of ≤3/11 points was changed to a CFQ Likert score of ≤18/33. The change of scoring method obscures direct comparison, but a Likert score of 18 can be a bimodal score of between 4 to 9, which the protocol regarded as abnormal or excessive fatigue. About 1% of participants simultaneously met both definitions of 'normal fatigue' (CFQ Likert ≤18/33) and 'severe fatigue' (CFQ bimodal ≥6/11) at baseline. Questions have therefore arisen over the method and normative population sample used to calculate this threshold.[9-12]
2) The previously required SF-36/PF score of ≥85/100 points was lowered to ≥60; worse than trial eligibility criteria for 'significant disability' (≤65). About 13% of participants simultaneously met both definitions of 'normal physical function' (a criterion for complete recovery) and 'significant disability' at baseline. The post-hoc revised threshold was derived from an inappropriate statistical calculation using a non-representative population sample which included the elderly and disabled. CFS occurs at all ages but in this trial of adults, 97% were aged under 60 years at baseline, and a diagnosis of CFS requires that other chronic disabling conditions which explain the fatigue etc are excluded. The stated justification for this drastic change, erroneously asserted that about half the general working age population score under 85, but it is actually 17.6%. Note that 92.3% of the 'healthy' working age English population score 85 to 100, and 61.4% score 100.
3) The required CGI score of 1 ("very much better") was relaxed so that 2 ("much better") also counted towards recovery. The next option 3 ("a little better") was regarded as a non-improvement. A moderate improvement in CGI score is non-specific, could be a result of improvement to one complaint while multiple major symptoms remain, and on its own does not guarantee any clinically significant improvements to the primary efficacy measures of fatigue and physical function.[14-15]
4) No longer meeting Oxford CFS criteria did not guarantee real-world recovery, because participants who otherwise still met Oxford criteria as usually applied and still experienced either severe fatigue or significant disability, could be disqualified by failing ad hoc criteria for either, even if their CFQ and SF-36/PF scores remained abnormal or one remained unimproved. (The optional requirements of not meeting CDC CFS criteria or London ME criteria were superfluous, not entry requirements, stricter than the Oxford CFS criteria, made no difference to the results, and were improperly applied.[6,17])
The technical details of the "planned" mediation analysis are not adequately covered in the published protocol or the statistical analysis plan. It is unknown what methodological changes occurred during this exploratory analysis or whether it was influenced by 'data dredging' and 'post-hoc revisions'. Earlier results were described in 2011 as: "There was modest mediation of CBT and GET effects (approximately 20% of the total effect)." Now much higher figures are being reported, even for individual mediators, including, "fear avoidance beliefs, the strongest mediator, accounted for up to 60% of the overall effect." Interestingly, Chalder et al. and the accompanying editorial by Knoop & Wiborg conceded that the causal relationships between mediators and outcomes were unclear.
The 60% figure compared GET with a non-representative version of pacing (APT), but news articles have misrepresented this as strong evidence that patients recover by overcoming their fears and exercising. This is contradicted by "an almost complete absence of improvements in objectively measured outcomes", including the fitness step-test which indicates that participants failed to exercise more, despite GET aiming to substantially increase activity/function e.g. 30 minutes of exercise at 60-75% maximum heartrate at least 5 times per week. The exception (GET walking distances) was not clinically significant and was not due to improved fitness. These results dispute the deconditioning model and instead reflect an activity ceiling determined by post-exertional symptoms and abnormal (pathophysiological) responses to exercise.[21,22]
This non-blinded trial tested therapies aimed to change participants' beliefs about symptoms and impairments, so the discrepancy between subjective and objective outcomes raises plausible concerns about biases with self-reports.[23,24]