The University of Chicago Journal Club
Edited by B Roitberg
Topic – Biases in Randomized Controlled Trials
Faculty: Ben Roitberg, Sandi Lam, Frederick Brown, Peter Warnke Residents: Mahua Dey, Ippei Takagi, Nassir Monim-Mansour, Javed Khader-Eliyas, Sophia Shakur, Ashley Ralston Article #1 – Randomised controlled trial to compare surgical stabilisation of the lumbar spine with an intensive rehabilitation programme for patients with chronic low back pain: the MRC spine stabilisation trial. Jeremy Fairbank, Helen Frost, James Wilson-MacDonald, Ly-Mee Yu, Karen Barker, Rory Collins for the Spine Stabilisation Trial Group. Spine (Phila Pa 1976). 2008 Oct 1;33(21):2334-40. Dey: In preparation for this Journal Club I tried to assign class of evidence to this study. As an RCT it should be class I. However, it has flaws in design and presentation. Is it still class I considering the limitations? The patients were randomized as to receive lumbar fusion (choice of fusion procedure was left to the surgeon) or to intensive rehabilitation (similar regimes between centers, 75 hrs with one day of follow-up at 1,3, 6 or 12 months). A total of 349 patients were recruited (176 surgery and 173 rehab) in 15 participating centers, and followed for 2 years. Primary Outcome was a complex measure that consisted of - Level of back pain: ODI 0 = no disability, 100 = severe disability. Mobility: Shuttle walking test measuring maximal walking distance in meters. Secondary Outcomes included a general health assessment, a psychological assessment and complications. Inclusion criteria: Clinician and patient uncertainty regarding which treatment option is best. Age 18-55. LBP > 12 months. Exclusion criteria: Previous stabilization surgery; Significant co-morbidities; Psychiatric disease; Pregnancy. Results: 81% follow up at 2 years; Cross-over: 28% of patients from rehab group had surgery at the end of 2 years of follow up. On the other hand, 7% of those randomized to have surgery opted for rehab. By intent to treat analysis there was a small but statistically significant effect of surgery in improving ODI (-4.1). No difference in any other outcome measure. The authors concluded that there was no clear evidence for the benefit of surgery over rehab in treatment of chronic lower back pain. This was a randomized study, a design that is intended to limit bias. The design was also ethically reasonable - uncertainty had to be present in the mind of the surgeon and the patient regarding the optimal course of action, thus allowing ethical randomization. This design presents a problem. Essentially, patients were randomized only in cases where the surgeon did not think surgery was actually really necessary. Surgical complications are low (11 needed repeat surgery). The study did not include a cost analysis. Another serious limitation - even patients with normal MRI could be included. Surgery was also not standardized. Brown: There are many questions that need to be answered before we draw conclusions from this paper. Patients were randomized to rehab – did all of them have rehab? Did the participating surgeons use any imaging criteria? Under the conditions described in this article, the vast majority of my patients would have refused to enter the study. As a first step before designing a study about efficacy of interventions we should ideally have at least an agreement on what the indications are among surgeons. It is difficult in the US given the practice and economic indications. Warnke: It was a randomized prospective trial and not a controlled trial, because there was no control group. The actual recruitment rate was very low – 5 or so patients per year per center, when there was the necessary uncertainty. Equipoise was not defined clearly enough and the treatment type i.e. type of surgery was heterogeneous and not standardized. Therefore the patient population may not be representative. The follow up rate was not bad on paper - 20% were lost to follow up; however, for some of the patients the questionnaire was to follow with family. This decreases the value of the follow up. The best follow up is independent – which was not the case here. This study has serious methodological problems, and was not included in the Cochrane report. It is not really class I data. This is a Medical Research Council (MRC) funded trial. MRC might have paid the treatment costs, and would often only give the treatment within 3 months to participants who otherwise would go on a wait list, providing an inducement not available in the US. MRC support also provides an impression of importance that helped get this study published, but it does not provide information sufficient to change clinical practice. Lam: This study is designed to be a randomized controlled trial comparing surgical intervention to intensive non-operative rehabilitation for chronic low back pain in the UK. The Medical Research Council funded this study and was represented on the steering committee. Even though this study may be claimed to provide Class 1 evidence because it is a randomized controlled trial, it is of low relevance to neurosurgical practice because of its problems with internal and external validity. Starting with the inclusion criteria, the study group does not fit into standard of care in the US. The patients had to have 12 months of chronic back pain, but there was no uniformity in imaging workup or conservative management leading up to the “uncertainty of outcome principle” with which patients were supposed to be accrued. Over 15 centers and the time period of 1996-2002, only 349 patients were enrolled, which is on average less than 25 patients per center and less than 4 patients per center per year. The total number of lumbar spine patients treated by these 15 centers over this time period is not provided, and the study patients would seem to constitute a very small fraction of the denominator. The characteristics of these patients in comparison to the rest of the lumbar spine patients treated are not known. The authors point out that there were 3 intended subgroups during recruitment for later subanalysis, but the vast majority of patients were recruited from one group only. Selection bias may also be present in that the patients who would most benefit from surgery were already excluded because of “certainty” on the surgeons’ or patients’ part. The time from randomization to intervention received is unusually short for most UK practices, with the large majority in less than 3 months, which may have provided selection bias in terms of incentives to certain types of patients to enroll in the study. Follow up was not reported well. While there is just over 80% response at 24 month follow up, responses were not necessarily directly from the patients as investigators describe efforts to collect data of primary and secondary outcomes (ODI, SF-36, shuttle walking test, psychological assessments) also via mail, phone, family doctor, and national databases. The breakdown of type and mode of response was not given. The quality of such outcomes data may be of limited value. Surgical intervention was not uniform, ranging from different fusion techniques to dynamic stabilization techniques. These are not comparable to each other, and represent a heterogeneous group with heterogeneous outcomes. Blinding of the patients and of the trial research therapists was not possible due to the invasive nature of surgery. Improvements in back pain in both groups may reflect natural history and resolution of low back pain, or regression to the mean. In this paper, conservative treatment was not defined. For those practicing in the US, we should take note that only about 1000 fusions are performed each year in England. For residents - look at Table 1 – randomization. It appears beautiful, a proper Table 1 for a randomized trial. This is a table where we see the baseline characteristics of the patients. Are the two groups comparable, showing internal validity? Although superficially the table looks good, the population was not well defined. They also did not look at obesity and other factors that are relevant to the outcome of spine surgery. The authors conclude that there is no clear evidence that primary spinal fusion is more beneficial than intensive rehabilitation with cognitive behavioral therapy in management of chronic low back pain. These conclusions would be difficult to support from a neurosurgeon’s perspective. The study group as defined may or may not have had indications for surgical management to begin with. Even though Table 1 shows good randomization, and this was a well-funded national UK study, the large deficiencies in internal and external validity cannot be overlooked. This study cannot be used by surgeons to guide to their own practice, though it serves to call for further better-designed studies with higher recruitment and follow up rates to help address the question of lumbar fusion versus non-operative rehabilitation management of chronic low back pain. Mansour: This study comes from an environment different from ours. In Britain there are long waiting lists, and patients leave the list for private surgery. Waiting lists can be of 6 months or more. This changes the incentives compared to the US, and thus the ability to recruit patients. For proper analysis of surgical outcomes, the procedure should be standard, not random. Roitberg: This article demonstrates how a prospective randomized trial can be designed and presented in a way that does not add to existing knowledge. In our practice, we do not offer surgery to patients if we are not really convinced that it is necessary. We always use imaging to confirm that a treatable pathology exists before we offer surgical treatment. It is difficult to extrapolate from this article to our patients. The conclusion I can draw from this article is: “When the surgeon is not convinced that an operation is necessary for back pain, when surgical or medical treatment are not standardized, and when imaging is not consistently used or reported - in the non-representative sample of population that was studied surgical treatment appeared to offer no benefit over non-operative management.” The criterion that the surgeon had to be uncertain that surgery is necessary is not the same as clinical equipoise regarding the choice of optimal procedure. The intent here was to have the ethical and practical ability to randomize the patients, but the result was limiting the study population to those who may not have needed surgery - according to their own surgeons’ assessment._____________________________________________________
Article# 2 Comparison of art disc vs ACDF Sasso RC, Smucker JD, Hacker RJ, Heller JG. Spine (Phila Pa 1976). 2007 Dec 15;32(26):2933-40; Khader-Eliyas: The goal of disc replacement is to retain better biomechanics, especially in younger patients, decrease intradiscal pressure and reduce degeneration in adjacent levels. This report is part of a larger Medtronic-sponsored study. This article is follow up of a fourth of those patients over 2 years. Outcome measures included - NDI, SF-36, radiographic measures. The treatment arms were one level ACDF (Anterior Cervical Discectomy and Fusion) compared to disc replacement. Inclusion criteria – single level disc disease, adults above 21 years of age, disease within C 3-7 levels, and failed conservative therapy for 6 weeks (except for myelopathy). Exclusion – skeletal deformity, more than 3.5 mm of translation or angulation more than 11 deg. Bryan disc (titanium and polyurethane) were implanted. 115 patients were enrolled, 56 had disc replacement, 59 had ACDF. The groups were comparable and there was a high rate of follow up. All patients did well after either operation. Difference was noted on pain scales, and SF-36 physical components - both were better with artificial disc. Significant, but the actual difference was small. The key limitations of this study is that industry sponsored trials may be biased. This can affect selection, reporting of results, conscious or unconscious. One or more of the authors financially benefited from the product. Dey: Is there a difference between young and old in response to these procedures? The average age in the study was 40, and older patients were not included. Are artificial discs appropriate for any age? Lam: This is a prospective randomized study with a total of 115 patients randomized in a 1:1 ratio between Bryan artificial disc replacement and anterior cervical fusion with allograft and plate at 3 centers involved in the US FDA IDE trial for evaluation of the Bryan artificial cervical disc. Functional outcomes using validated self-report measures of NDI and SF-36, numerical VAS pain rating scales, and radiographic assessments are defined outcome measures and available for 110 patients with 12 month follow up and for 99 patients (86%) with 24 month follow up, with data collection at 1 preoperative and postoperative 3, 6, 12, and 24 month follow up time points. Inclusion and exclusion criteria are defined clearly. This is a well-designed study. Table 1 shows comparable characteristics between the investigational Bryan artificial disc and the control groups. The authors conclude that the study demonstrates favorable outcomes of Bryan cervical disc arthroplasty versus the gold standard of ACDF at 24 month follow up, and they correctly recognize the need for more intermediate and long-term data collection in evaluating this device and technique. It is important to consider the potential biases in this study. In the disclosure, one or more authors have commercial interest in this product and benefit from industry funding. While the study may be nicely designed, this conflict of interest is difficult to dismiss, and may bias results toward the Bryan artificial disc. The patient may not be fully blinded to the type of surgery received, so there may be a report bias with more favorable outcomes because the investigational group of patients feels that they received the “newer and better” treatment. The radiographic measurements also cannot be blinded given the presence of the implants even though they are taken by 2 “blinded, trained” observers, so there may be report bias as well. There may also be an unquantifiable bias in surgical technique: it is anatomically difficult to explain why the Bryan cervical arthroplasty patients have greater improvements in arm pain. Are more thorough decompressions done with the cervical arthroplasty versus the ACDF even though the surgical technique reported is that the discectomy and decompression are done the same way for every surgery prior to implant placement regardless of implant? Biases are always difficult to eliminate, especially in surgical trials. We must be able to critically evaluate these papers and each decide for ourselves if the trial and results are good enough and generalizable enough to convince us to change our own practices. This is a good execution of a well-designed industry-sponsored study, and long-term follow up results will undoubtedly follow. For the residents - the criteria for cervical spine instability such as translation and angulation measurements are generally acknowledged to be defined originally by studies reported by Panjabi and White in the 1980’s. . Brown: Why would there be a better improvement in radiculopathy with an artificial disc than with the fusion? It makes no sense, unless in cases of artificial disc the surgeons performed a more thorough decompression. This is a clear instance where a systematic bias may have been introduced. Warnke: What is the size of the beneficial effect observed for arthroplasty? It is a high p, but the power may be low, because the study was not blinded and there is a known “new treatment” effect. Roitberg: The study raises a few questions. How did they authors determine translation and angulation criteria? These numbers have been around since the seventies, and have been copied uncritically ever since. We should always look for the source of some of those numbers that everyone just uses uncritically. The study does not address the issue of loss of segmental mobility at the arthroplasty level over time. High heterotopic ossification rate (40-70%) has been reported for all types of artificial cervical discs (1). Is there normal mobility at the replaced level? The authors published a biomechanical analysis in a separate article, which suggests maintenance of mobility of 6.7 degrees at the arthroplasty level (2). Should we use artificial discs in the cervical spine? They appear to be safe to use at least in the short term, with good outcomes, and when used as indicated within the studied limits. Time will tell, as usual, but our duty is to look out for potential systematic biases that may affect results. These are often not conscious, and are not necessarily by the surgeons or the authors at all. Patients may feel better when receiving a new and more advanced treatment. Staff may be enthusiastic about the study or about new technology in general. The two articles we discussed today provide an opportunity to show that prospective studies are not inherently free from bias. There are pressures and incentives that may affect the conduct of the study and reporting the results. The bias that may influence industry sponsored trials is well known, but a variety of pressures, biases and considerations also affect government-sponsored studies. In this case an industry sponsored study appears to be better conducted and presented than a government sponsored study. Neither is free from risk of systematic bias, which is based on personal or institutional desire to see specific results. There is a deeper question – is it possible to design an unbiased randomized trial in surgery? The challenge is immense – there should be surgical equipoise regarding the choice of procedure, unbiased and uniform performance of procedures, patients blinded to what they are receiving (sham surgery?), and data collection and analysis by independent observers. A perfect surgical study in humans may rarely if ever be possible, and therefore we must understand the limitations of currently available research. The practice of assigning classes to medical evidence should not replace careful analysis of each article we read, with a working “skeptics tool kit”. Reference: 1) Yi S, Kim KN, Yang MS, Yang JW, Kim H, Ha Y, Yoon do H, Shin HC. Difference in occurrence of heterotopic ossification according to prosthesis type in the cervical artificial disc replacement. Spine (Phila Pa 1976). 2010 Jul 15;35(16):1556-61. 2) Sasso RC, Best NM. Cervical kinematics after fusion and Bryan disc arthroplasty. J Spinal Disord Tech. 2008 Feb;21(1):19-22.