Virology blog post by Dr David Tuller, 17 July 2017: Trial by error: the NICE guidelines and more on the CDC
… What is obvious, but what NICE appears to overlook, is that the CBT/GET literature is plagued by two overarching flaws. The first is that these are mostly open-label trials with subjective outcomes—a study design rejected in other fields of medicine because it is so prone to bias. The second is that many of the studies, especially older ones, rely on the use of the Oxford criteria or other broad case definitions that yield heterogeneous samples because they conflate chronic fatigue and chronic fatigue syndrome. These studies simply cannot provide credible evidence about treatments for the specific disease that patients prefer to call ME. (PACE, of course, suffers from many other flaws besides these two.)
Some have argued that it is better to be able to provide patients with some recommendations rather than none. It is therefore important to stress that a recommendation should not be kept just because there are no better recommendations to make. If a recommendation is based on results from studies that promote bias, or on results that have been inflated through outcome-switching or have been derived from heterogeneous samples or are inaccurate for other reasons, then the recommendation needs to be rescinded, even if there is not sufficient current evidence to suggest other proven treatment approaches. That is especially the case when patient surveys and biomedical evidence raise concerns that a recommended strategy—in this instance, a steady increase in activity levels–is causing serious harm rather than benefit.
The fact that NICE decided to reconsider the 2007 guidelines this year seemed like a potentially promising development, but it was unclear what new information the agency would consider. For example, would the surveillance review include only reports from clinical trials of therapeutic interventions? Or would it also include findings of physiological abnormalities, most from research produced outside the U.K., which undermine the deconditioning theory that supports use of CBT and GET?
Other questions: Would the review consider the conclusion of the 2015 report from the U.S. Institute of Medicine (now the Academy of Medicine) that “exertion intolerance” was a cardinal symptom, which raises questions about whether GET is contra-indicated? Would it consider that the U.S. Agency for Healthcare Research and Quality found little or no evidence for CBT and GET after removing Oxford criteria studies from its analyses? And would it include the reanalyses of the reported PACE “recovery” and “improvement” findings, which were dramatically boosted by post-hoc outcome changes?
NICE’s “surveillance proposal consultation document” for CG53, recently posted on its website, provides answers to many of the questions. This 56-page report offers details about the NICE surveillance review and the reasons for the agency’s provisional decision not to change the guidelines. The review was apparently conducted by a NICE “surveillance team” with input from an unidentified number of unidentified “topic experts.” (I have filed a freedom-of-information request with NICE for the names of the experts it consulted. NICE has 20 working days to respond. The ME Association is also seeking the names, and the Countess of Mar has asked the Department of Health for the same information.)
The consultation document indicates that the surveillance team and topic experts did in fact take notice of the recent controversies and the new literature, including the reanalyses, before concluding that the guidelines should remain the same. The document noted that some upcoming research could impact the guidelines down the line, and identified specifically a study of online CBT for kids—an apparent reference to FITNET-NHS. This study exemplifies some of the problems common in this field of research, as I described on Virology Blog months ago. (Professor Esther Crawley of Bristol University, the trial’s lead investigator, subsequently referred to that blog post as “libelous” in a slide she showed during at least two speeches. She has not documented her charge.)
The consultation document also notes that only study abstracts, not the studies themselves, were reviewed. This is a surprising methodological choice given the significance of the issue. Abstracts can be seriously misleading and incomplete; studies themselves obviously provide a much more authoritative and nuanced picture. It does not seem too much to expect that those responsible for establishing enormously influential clinical guidelines should have taken the time to examine the actual research on which they were basing their recommendations. To learn that they did not is rather shocking.
In response to the controversy over the PACE trial, the document notes more than once that the investigators themselves have responded to criticisms, citing the FAQ on the trial website and other publications. The surveillance team appears to accept these responses at face value–as thorough and honest explanations. Perhaps no one has examined them closely enough to realize how empty and full of half-truths they are. The PACE investigators have certainly tried to defend their work. But there are no reasonable answers to many of the concerns, so their responses to date have only satisfied their ideological companions and those who know little about the debate.
The consultation document contains some troubling inaccuracies in its discussion of the PACE trial. For example, it reports that “the PACE authors note that…changed thresholds for recovery were pre-specified.” But it is simply false to call the revised thresholds “pre-specified.” The recovery thresholds for physical function and fatigue—two of the four recovery criteria in the 2013 paper published in Psychological Medicine–were the same as the “normal range” thresholds included in the 2011 Lancet paper. In that earlier paper, these thresholds were presented as part of post-hoc analyses, so it is hard to understand how the same thresholds could also be “pre-specified” under any standard definition of the term.
Besides that, all four recovery criteria were weakened, so it was self-evident that each change, whether pre-specified or not, would boost the numbers that the investigators could report had achieved “recovery.” Moreover, the 2013 paper does not cite any oversight committee approval for the major changes to the “recovery” definition–an oversight that should have raised alarm bells for Psychological Medicine. (Since the NICE surveillance team only reviewed abstracts and not actual papers, it would not have noticed this unusual lack of oversight committee approval.) And the consultation document fails to mention the fact that 13 percent of the PACE participants were already “recovered” for physical function at entry—an anomaly that should have prevented publication.
In another inaccurate (or at least highly disingenuous) statement, the consultation document notes that the investigators “have re-analysed the main outcome measures according to the original protocol with similar results to those in the primary PACE results paper i.e. reduced fatigue and increased physical function.” But it is stretching the truth beyond recognition to claim that the results of the reanalysis were “similar.” In the 2011 Lancet paper, the investigators reported that around 60 percent “improved” with CBT and GET; in the 2016 reanalysis, which used the original PACE protocol’s definition of “improvement,” the figure fell to around 20 percent.
Proponents of PACE have cited the 60 percent improvement rate as evidence of the effectiveness of CBT and GET. So a two-thirds decline in improvement rates should change any reasonable observer’s assessment of the effectiveness of the interventions. For NICE to accept the PACE investigators’ argument that this dramatic drop represents “similar results”–presumably because they were still able to report some very modest “improvement”–suggests that the surveillance team and topic experts are assessing the data with preformed opinions.
The topic experts seem to have enabled some of the NICE surveillance team’s own poor instincts. For example, after noting concerns raised about the Oxford criteria, the consultation document dismisses the significance of the issue this way:
“Trials using Oxford criteria were eligible when developing NICE guideline CG53, and topic experts had no concerns about the inclusion criteria of trials in CFS. It was also noted by topic experts that there is no gold standard definition of chronic fatigue syndrome. There is currently insufficient consistent evidence about diagnostic methods for CFS/ME to determine an impact on the guideline recommendations.”
Hm. So they used these studies the first time around ten years ago, and therefore it must be okay to use them again; something about that logic escapes me. And the topic experts expressed “no concerns,” shrugging off the case definition problem because there is “no gold standard.” But this thorny issue is at the core of the current controversy, and failure to address it is not a viable option. Scientists outside the influence of the CBT/GET ideological brigades understand very well that the populations generated with the Oxford criteria cannot yield actionable findings about an illness that should be defined much more specifically.
The consultation document also states that NICE will encourage Cochrane to update a 2008 review of CBT so that it can include the reported results from the PACE trial. “A further review of the guideline may be considered following publication of the updated Cochrane review,” stated the document. In other words, the NICE surveillance team is not only not deterred from considering the PACE results but is taking steps that would amplify their impact on the recommendations.
In fact, by citing Cochrane’s reviews as key evidence to support keeping the guidelines in place, the consultation document highlights the major role of these analyses in bolstering the entire CBT/GET enterprise. Cochrane takes the same problematic approach to assessing studies as NICE, accepting the results of open-label trials with subjective outcomes even though these are known to suffer from serious bias. In addition, Oxford criteria studies dominate the Cochrane reviews. In short, the body of research being used by both Cochrane and NICE, including but not limited to PACE, suffers from fundamental flaws. (Perhaps if Cochrane removed responsibility for the illness from the “common mental disorders group,” where it doesn’t belong, a new set of reviewers would demand higher-quality evidence.)…