Blog post by Prof James C Coyne, 23 April 2016: Probing an untrustworthy Cochrane review of exercise for “chronic fatigue syndrome”
From my work in progress:
My ongoing investigation so far has revealed that the 2016 Cochrane review misrepresents what was done and what was found in a key meta analysis . These problems are related to an undeclared conflict of interest.
The first author and spokesperson for the review, Lillebeth Larun was also the first author on the protocol for a Cochrane review that is not-yet published.
Larun L, Odgaard-Jensen J, Brurberg KG, Chalder T, Dybwad M, Moss-Morris RE, Sharpe M, Wallman K, Wearden A, White PD, Glasziou PP.
Exercise therapy for chronic fatigue syndrome (individual patient
data) (Protocol). Cochrane Database of Systematic Reviews 2014, Issue 4. Art. No.: CD011040.
At a meeting organized and financed by PACE investigator Peter White, Larun obtained privileged access to data that the PACE investigators have spent tens of thousands of pounds to keep most of us from viewing. Larun used this information to legitimize outcome switching or p-hacking favorable to the PACE investigators’ interests. The Cochrane review misled readers in presenting how some analyses were conducted that were crucial to its conclusions.
One of the crucial function of Cochrane reviews is to protect policymakers, clinicians, researchers, and patients from the questionable research practices utilized by trial investigators to promote particular interpretation of their results. This Cochrane review fails miserably in this respect. The Cochrane is complicit in endorsing the PACE investigators’ misinterpretation of their findings.
A number of remedies should be implemented. The first could be for Cochrane Editor in Chief and Deputy Chief Director Dr. David Tovey to call publicly for release for independent reanalysis of the PACE trial data from The Lancet original outcomes paper and the follow-up data reported in Lancet Psychiatry.
Given the breach in trust with the readership of Cochrane that has occurred, Dr. Tovey should announce that the individual patient-level data used in the ongoing review will be released for independent re-analysis.
Larun should be removed from the Cochrane review that is in progress. She should recuse herself from further comment on the 2016 review. Her misrepresentations and comments thus far have tarnished the Cochrane’s reputation for unbiased assessment and correction when mistakes are made.
An expression of concern should be posted for the 2016 review.
The 2016 Cochrane review of exercise for chronic fatigue syndrome:
Larun L, Brurberg KG, Odgaard-Jensen J, Price JR. Exercise therapy for chronic fatigue syndrome. Cochrane Database Syst Rev. 2016; CD003200.
Added only three studies that not included in a 2004 Cochrane review of five studies:
Wearden AJ, Dowrick C, Chew-Graham C, Bentall RP, Morriss RK, Peters S, et al. Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: randomised controlled trial. BMJ 2010; 340 (1777):1–12. [DOI: 10.1136/bmj.c1777]
Hlavaty LE, Brown MM, Jason LA. The effect of homework compliance on treatment outcomes for participants with myalgic encephalomyelitis/chronic fatigue syndrome. Rehabilitation Psychology 2011;56(3):212–8.
White PD, Goldsmith KA, Johnson AL, Potts L, Walwyn R, DeCesare JC, et al. Comparison of adaptive pacing therapy, cognitive behaviour therapy, graded exercise therapy, and specialist medical care for chronic fatigue syndrome (PACE): a randomised trial. The Lancet 2011; 377:611–90.
This blog post concentrates on a sub analysis that is crucial to the conclusions of the 2016 review reported on page 68. I welcome others to extend this scrutiny to other analyses in the review, especially that for the SF-36.
Analysis 1.1. Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 1 Fatigue (end of treatment).
The only sub analysis that involves new studies includes Wearden et al FINE trial, White et al PACE trial and an earlier study, Powell et al. The meta-analysis gives 27.2% weight to Wearden et al and 62.9% weight to White et al.
Inclusion of the Wearden et al FINE trial in the meta-analysis
Concerning Wearden et al, the Cochrane review states on page 49:
This is untrue.
Cochrane used a ‘Likert’ scoring method (0,1,2,3), but the original Wearden et al. paper reports using the…
11 item Chalder et al fatigue scale,19 where lower scores indicate better outcomes. Each item on the fatigue scale was scored dichotomously on a four point scale (0, 0, 1, or 1).
This would seem a trivial difference, but this outcome switching will take on increasing importance as we proceed.
Based on a tip from Robert Courtney. I found the first mention of a re-scoring of the Chalder fatigue scale in the Weardon study in a BMJ Rapid Response:
Wearden AJ, Dowrick C, Chew-Graham C, Bentall RP, Morriss RK, Peters S, et al. Nurse led, home based self help treatment for patients in primary care with chronic fatigue syndrome: randomised controlled trial. BMJ, Rapid Response 27 May 2010.
The excuse that was offered for the rescoring in the Rapid Response was:
“Following Bart Stouten’s suggestion that scoring the Chalder fatigue scale (1) 0123 might more reliably demonstrate the effects of pragmatic rehabilitation, we recalculated our fatigue scale scores.”
“Might reliably demonstrate…”? Where I come from, we call this outcome switching, p-hacking, a questionable research practice, or simply cheating.
In the original reporting of the trial, effects of exercise were not significant at follow-up. With the rescoring of the Chalder fatigue scale, these results are now significant.
Inclusion of the White et al PACE trial in the meta-analysis
A physician who suffers from myalgic encephalomyelitis (ME) – what both the PACE investigators and Cochrane review term “chronic fatigue syndrome” – sent me the following comment:
I have recently published a review of the PACE trial and follow-up articles and according to the Chalder Fatigue Questionnaire, when using the original bimodal scoring I only score 4 points, meaning I was not ill enough to enter the trial, despite being bedridden with severe ME. After changing the score in the middle of the trial to Likert scoring, the same answers mean I suddenly score the minimum number of 18 to be eligible for the trial yet that same score of 18 also meant that without receiving any treatment or any change to my medical situation I was also classed as recovered on the Chalder Fatigue Questionnaire, one of the two primary outcomes of the PACE trial.
So according to the PACE trial, despite being bedridden with severe ME, I was not ill enough to take part, ill enough to take part and recovered all 3 at the same time …
Yet according to Larun et al. there’s nothing wrong with the PACE trial.
Inclusion of the White et al PACE trial in the meta-analysis
Results of the Wearden et al FINE trial were available to the PACE investigators when they performed the controversial switching of outcomes for their trial. This should be taken into account in interpreting Larun’s defense of the PACE investigators in response to a comment from Tom Kindlon. She stated:
You particularly mention the risk of bias in the PACE trial regarding not providing pre-specified outcomes however the trial did pre-specify the analysis of outcomes. The primary outcomes were the same as in the original protocol, although the scoring method of one was changed and the analysis of assessing efficacy also changed from the original protocol. These changes were made as part of the detailed statistical analysis plan (itself published in full), which had been promised in the original protocol. These changes were drawn up before the analysis commenced and before examining any outcome data. In other words they were pre-specified, so it is hard to understand how the changes contributed to any potential bias.
I think that what we have seen here so far gives us good reason to side with Tom Kindlon versus Lillebeth Larun on this point.
Also relevant is an excellent PubMed Commons comment by Sam Carter, Exploring changes to PACE trial outcome measures using anonymised data from the FINE trial. His observations about the Chalder fatigue questionnaire:
White et al wrote that “we changed the original bimodal scoring of the Chalder fatigue questionnaire (range 0–11) to Likert scoring to more sensitively test our hypotheses of effectiveness” (1). However, data from the FINE trial show that Likert and bimodal scores are often contradictory and thus call into question White et al’s assumption that Likert scoring is necessarily more sensitive than bimodal scoring.
For example, of the 33 FINE trial participants who met the post-hoc PACE trial recovery threshold for fatigue at week 20 (Likert CFQ score ≤ 18), 10 had a bimodal CFQ score ≥ 6 so would still be fatigued enough to enter the PACE trial and 16 had a bimodal CFQ score ≥ 4 which is the accepted definition of abnormal fatigue.
Therefore, for this cohort, if a person met the PACE trial post-hoc recovery threshold for fatigue at week 20 they had approximately a 50% chance of still having abnormal levels of fatigue and a 30% chance of being fatigued enough to enter the PACE trial.
A further problem with the Chalder fatigue questionnaire is illustrated by the observation that the bimodal score and Likert score of 10 participants moved in opposite directions at consecutive assessments i.e. one scoring system showed improvement whilst the other showed deterioration.
Moreover, it can be seen that some FINE trial participants were confused by the wording of the questionnaire itself. For example, a healthy person should have a Likert score of 11 out of 33, yet 17 participants recorded a Likert CFQ score of 10 or less at some point (i.e. they reported less fatigue than a healthy person), and 5 participants recorded a Likert CFQ score of 0.
The discordance between Likert and bimodal scores and the marked increase in those meeting post-hoc recovery thresholds suggest that White et al’s deviation from their protocol-specified analysis is likely to have profoundly affected the reported efficacy of the PACE trial interventions.
Compare White et al.’s “more sensitively test our hypotheses” to Weardon et al.’s ““might reliably demonstrate…” explanation for switching outcomes.
A correction is needed to this assessment of risk of bias in the review for the White et al PACE trial.
A figure on page 68 shows results of a subanalysis with the switched outcomes at the end of treatment.
This meta analyses concludes that exercise therapy produced an almost 3 point drop in fatigue on the rescored Chalder scale at the end of treatment.
Analysis 1.2. Comparison 1 Exercise therapy versus treatment as usual, relaxation or flexibility, Outcome 2 Fatigue (follow-up).
A table on page 69 shows results of a subanalysis with the switched outcomes at follow up:
This meta analysis entirely depends on the revised scoring of the Chalder fatigue scale and the FINE and PACE trial. It suggests that the three point drop in fatigue persists at followup.
But Cochrane should have stuck with the original primary outcomes specified in the original trial registrations. That would have been consistent what with the Cochrane usually does, what is says it did here, and what its readers expect.
Readers were not at the meeting that the PACE investigators financed and cannot get access to the data on which the Cochrane review depends. So they depend on Cochrane as a trusted source.
I am sure the results would be different if the expected and appropriate procedures had been followed. Cochrane should alert readers with an Expression of Concern until the record can be corrected or the review retracted.
it too much to ask that Cochrane get out of bed with the PACE investigators?
What would Bill Silverman say? Rather than speculate about someone who neither Dr.Tovey or I have ever met, I ask Dr Tovey “What would Lisa Bero say?”