University of Chicago Neurosurgery Journal Club, June 2011

Edited by: B Roitberg, MD

Faculty: F. Brown, B. Roitberg.

Residents: J. Hsieh, I. Takagi, N. Monim-Mansour, A. Bhansali, J. Khader-Elyias.

1) Weinstein JN, Tosteson TD, Lurie JD, Tosteson AN, Hanscom B, Skinner JS, Abdu WA, Hilibrand AS, Boden SD, Deyo RA. Surgical vs nonoperative treatment for lumbar disk herniation: the Spine Patient Outcomes Research Trial (SPORT): a randomized trial. JAMA. 2006 Nov 22;296(20):2441-50.

Hsieh: This Spine Patient Outcomes Research Trial (SPORT) evaluates surgery vs. non-surgical treatment for lumbar disc herniation. It is a landmark study, and is perhaps the single most publicized trial evaluating surgical management of back pain in scientific literature today. And as such, the study’s primary conclusion, that there is no difference in outcomes between surgery and non-surgery for lumbar disc herniation, can easily be misinterpreted and misquoted and warrants critical evaluation.

SPORT is a randomized clinical trial encompassing 13 multidisciplinary spine clinics. The lumbar disc herniation study enrolled a total of 501 surgical candidates with lumbar disk herniation and persistent signs and symptoms of radiculopathy for at least 6 weeks. Patients were randomized to open discectomy or non-operative treatment. The primary outcomes were changes from baseline for the Medical Outcomes Study 36-item Short-Form Health Survey (SF-36) bodily pain and physical function scales and the modified Oswestry Disability Index (ODI) at 6 weeks, 3 months, 6 months, and 1 and 2 years from enrollment. Secondary outcomes included sciatica severity, satisfaction with symptoms, self-reported improvement, and employment status. Groups were compared using an intent-to-treat analysis. After analysis, the study concluded: “Between-group differences in improvements were consistently in favor of surgery for all periods but were small and not statistically significant for the primary outcomes.” However, the devil is in the details.

The crux of the argument around SPORT lies in the intent-to-treat analysis and patient crossover. Say you are doing a study and you have randomized patients into two equal groups: A and B. Now by design, Group A is randomized to be your "Surgery" group and Group B is randomized to be your "Non Surgery" group. No matter what treatment the patient ultimately gets, you will analyze them as to their initial randomization. By doing this analysis, you preserve your randomization and can gauge the outcomes after prescribing treatment (i.e. your intent to treat) regardless of the treatment the patient ultimately receives. This is what researchers mean by “intent-to-treat” analysis.

Now, if there is no deviation from the randomized group and actual treatment (i.e. everyone behaves like they are supposed to) then the groups look something like this.

Intent-to-Treat: Everyone adheres to assigned treatment

If surgery really does improve outcomes, you should be able to tell by comparing these groups.

In a surgical study like SPORT, because there is no way to force a person to get surgery or not get surgery, some will decide to choose the other option (i.e. "crossover"). Let's say 40% of Group A, the supposed "surgical" group, changes its mind and opts for a non-surgical treatment instead. The groups then look like this:

Intent-to-Treat: Group A Crossover

Now let's say that 45% of Group B, the supposed "non-surgical" group, changes its mind and crosses over to get surgery. The groups then look like this:

Intent-to-Treat: Group A and B Crossover

Now you compare Group A and B. Why? Because you've committed to do an intent-to-treat analysis. However, because of all the people who have crossed over, Group A now looks remarkably similar to Group B. It is as if instead of comparing ice tea to lemonade, now instead you must compare ice tea-lemonade to lemonade-ice tea. The two become remarkably similar. It comes as no surprise that when you compare the two groups, the outcomes aren't very different, and your ability or “power” in determining whether surgery was in fact better has been largely washed out.

This is unfortunately what happened in SPORT. Specifically, at 2 year follow-up, among Group A patients, the supposed "surgery” group, only 60% had surgery and 40% not have surgery. Among group B, the supposed "non-surgery" group, 45% had surgery and only 55% not get surgery exactly as in the above illustration. In fact, the “non-surgery” patients who ultimately opted for surgery were far more likely to start off with worse symptoms and disability than those who did not crossover, suggesting they self-selected to surgery when they felt conservative therapy was ineffective. Their eventual outcome was nevertheless credited to the non-operative arm. Again, the observed differences in primary and secondary outcomes between Groups A and B are small and not statistically significant because the groups are almost identical. To their credit, the authors warn that “because of the large number of patients who crossed over in both directions, conclusions about the superiority or the equivalence of treatments are not warranted based on intent-to-treat analysis.” Unfortunately, none of this “small print” is captured in the publicized sound bites.

Brown: In my practice, the most common reason patients choose surgery is because symptoms are worsening. It may not be possible to design an intent-to-treat study for a condition that is as dynamic as radiculopathy from herniated disc. This first SPORT study did not discuss the crossover effect adequately, generating a bias against surgery. However, I am not sure it had a practical effect on the frequency of this operation.

Mansour: Multiple factors may influence patient choice of treatment, adherence to treatment plan, and chance of surgical success. Among those factors are the imaging appearance of the herniated disc, socio-economic status of patients, home support and other factors. We do not know if the groups were matched in that regard.

Roitberg: Dr Hsieh presented an excellent discussion of this multi-center trial of treatment of lumbar radiculopathy. The study required a giant effort - anyone who tried to organize a controlled trial and follow patients up thoroughly at a high response rate knows what a huge achievement this study represents.

However, the initial reports from SPORT were quite limited beyond the problem of large crossover. Please note that patients were randomized after 6 weeks of symptoms, which is an arbitrary number the study participants could agree upon; too short for many patients who have pain only, but too long for those who have symptoms that are rapidly progressing or severe. Patients with progressive neurological symptoms were not included in the study, and are expected to be chosen for surgical treatment immediately when the symptoms occur.

As Dr Brown noted, lumbar disc herniation is a dynamic condition where symptoms can change for better or worse. It is difficult to assign patients to rigid groups for changing symptoms of a dynamic disease process. In prior journal clubs we talked about equipoise – the situation where the surgeon (or the patient) truly believes that either of the options offered in the randomization is equally reasonable. If there is no equipoise, the surgeon, the patient, or both will not adhere to the assigned treatment. Equipoise was not discussed in this study; moreover equipoise is not possible when the condition of the patient can rapidly change over time.

Another criticism of the study centers is the lack of a standard medical management. It is possible that some forms of non-operative management are more effective than others. Some patients may have received sub-optimal medical management and thus both patient adherence and the apparent effectiveness of non-operative management were compromised.

In this American study all the patients could eventually choose what they want; those who were randomized to non-operative management but felt they needed surgery because medical treatment was failing, were operated, with the result of surgery assigned to the medical group by the intent-to-treat analysis. In other countries, with a more restrictive medical care delivery system, adherence to a treatment arm can possibly be enforced by denial of benefits to those who do not adhere to the assigned treatment. Many in the US would consider this unethical.

So, if the intent-to-treat analysis was wrong in this case, why the insistence? Why was the study designed to be analyzed this way in the first place? The study was planned as a randomized controlled trial (RCT). As-treated analysis eliminates randomization and is not suitable for an RCT. Intent-to-treat analysis is the gold standard for clinical trials, but is not the only alternative way to analyze results of a clinical study. Patients who did not adhere to the treatment to which they were randomized can be excluded from final analysis. Such exclusion preserves randomization and can result in a greater apparent power to find a difference between the groups, but increases the risk of type I error (false positive) [1]. Any exclusion of patients after the randomization (called “post-hoc” exclusion) presumes that the lacking data was missing at random. This may not be true. Data may be lacking from a treatment arm for such non-random reasons as more side effects from one of the treatments. Eliminating those patients who could not tolerate the side effects may overestimate the efficacy of that treatment in real life. Intent-to-treat analysis takes that into account because it evaluates the efficacy of the decision to choose one treatment over another - lack of adherence is not assumed to be random but is counted against the treatment not adhered to.

Moreover, intent to treat analysis is especially powerful when an effective treatment (like surgery) arrests the progression of the disease. Thus the patient benefits long after the treatment. A competing treatment that requires ongoing compliance will appear less effective on intent-to-treat analysis because the total number of patients who become non-compliant is expected to increase over time.

On paper, the design of the study was optimal to evaluate the efficacy of surgical intervention. In practice, the dynamic nature of the disease, lack of equipoise and freedom of patient choice resulted in a crossover large enough to practically invalidate the randomized intent-to-treat analysis.

2) Weinstein JN, Lurie JD, Tosteson TD, Tosteson AN, Blood EA, Abdu WA, Herkowitz H, Hilibrand A, Albert T, Fischgrund J. Surgical versus nonoperative treatment for lumbar disc herniation: four-year results for the Spine Patient Outcomes Research Trial (SPORT). Spine (Phila Pa 1976). 2008 Dec 1;33(25):2789-800.

Hsieh: The next article is another report from SPORT. This time, instead of reporting results based primarily on intent-to treat, Weinstein et al. report 4-year outcomes based on an as-treated analysis for both the original prospective randomized groups (501 participants) and additional observational cohorts (743 participants). In this analysis, the original randomization is ignored and instead participants are categorized based upon how they were actually treated. Here the results are overwhelmingly in favor of surgery in all counts, durably so over 4 years. Surgery patients had far less pain, better physical function, and less disability than non-surgery patients. Indeed, the authors conclude that “In a combined as-treated analysis at 4 years, patients who underwent surgery for a lumbar disc herniation achieved greater improvement than non-operatively treated patients in all primary and secondary outcomes except work status.”

Roitberg: The article is simple to understand. It’s main value – the demonstration of durable effect of surgery over time. Few of us follow patients for 4 years after surgery. Using as-treated analysis negates the original randomization. This is now a an observational cohort study. As before, the study is suffering from some limitations - “Usual nonoperative care” is ill defined; the surgical treatment is “standard open discectomy” – whereas many surgeons now typically perform a minimally invasive procedure. The study does not include a patient-drive subjective assessment of outcome. It is an important parameter of outcome, often neglected just because it is subjective.

3) – Weinstein JN, Lurie JD, Tosteson TD, Zhao W, Blood EA, Tosteson AN, Birkmeyer N, Herkowitz H, Longley M, Lenke L, Emery S, Hu SS. Surgical compared with nonoperative treatment for lumbar degenerative spondylolisthesis: four-year results in the Spine Patient Outcomes research Trial (SPORT) randomized and observational cohorts. Bone Joint Surg Am 2009 Jun;91(6):1295-304.
Bhansali: In this paper the authors selected patients with lumbar spinal stenosis causing neurogenic claudication and/or radicular pain for at least 12 weeks; radiographic evidence of spinal stenosis; degenerative spondylolisthesis on standing lateral radiographs; and who were judged to be candidates for surgery. Patients with spondylolysis and isthmic spondylolisthesis were excluded. Surgical intervention was a “standard” posterior decompressive laminectomy with or without bilateral single level fusion with autologous graft or pedicle screws. The intent-to-treat analysis showed no statistical difference between surgical and non-surgical groups, expected given the large crossover. The as-treated analysis of both patient groups showed an early surgical benefit that persisted over the 4 years of follow up.

SPORT was the largest outcome study of spine patients, and thus is often quoted by medical professionals. However, the low adherence rates to the assigned treatment arms resulted in loss of predictive power – the theoretical advantage of intent-to-treat design.

Mansour: The authors excluded many patients who are routinely treated by us for spondylolisthesis. They also did not report the results of imaging such as flexion and extension radiographs. If the patient is unstable on such imaging, the surgical indication would be stronger in our practice.

Khader-Elyias: It is difficult to draw any conclusions for the articles we discussed so far that would change our practice or the way we discuss the surgical risks and benefits with the patients. Roitberg: The article presents a varied group of patients. Some were enrolled in a randomized trial, other were not, the operations were variable - laminectomy with or without fusion. There is ongoing controversy regarding the indications for fusion and how to do it, yet all these patients were analyzed as one group. As in other parts of SPORT patients often chose what they thought was needed regardless of group assignment (54% of the cohort randomized to non-op care eventually received surgery), and surgeons did what they thought was best for the patient. I see the greatest asset of this study in the duration of the follow up. Few spine surgeons in practice keep following patients up for 4 years. It is encouraging to see persistent benefit for surgery.

4) - Weinstein JN, Tosteson, TD, Lurie JD, Tosteson A, Blood E, Herkowitz H, Cammisa F, Albert T, Boden SD, Hilibrand A, Goldberg H, Berven S, An H. Surgical versus nonoperative treatment for lumbar spinal stenosis: four-year results of the SPINE Patient Outcomes Research Trial (SPORT). Spine (Phila Pa 1976). 2010 Jun 15;35(14):1329-38.

Bhansali: In this article the authors selected patients with lumbar spinal stenosis causing neurogenic claudication or radicular leg pain lasting for at least 12 weeks; radiographic evidence of spinal stenosis, and who were judged to be candidates for surgery. Patients with instability and degenerative spondylolisthesis were excluded from this study. The surgical intervention was ‘standard” decompressive laminectomy; non-surgical management included physical therapy, counseling about home exercises, and NSAIDs. The intent-to-treat analysis of the randomized cohort showed no difference between operative and nonoperative management, similar to the first SPORT article we discussed. The as-treated analysis of both the randomized and observational cohorts showed improved outcomes with surgery, in early postoperative follow-up and 4 years later.

As the authors noted, the atudy had a very high rate of crossover. The non-operative management also did not include many modalities that are common and may have different degrees of efficacy, like epidural injections, chiropractic, etc.

Mansour: Current literature does not include studies of the natural history of diseases. Symptomatic lumbar spinal stenosis is a progressive disease, so a decision not to operate at a certain point in time may simply be delaying the inevitable (if the patient lives long enough to suffer progression of the disease). Studies of the natural history may be successfully performed in less developed countries where many people simply do not have access to care. The question is who will be doing the observation. Roitberg: This is a relatively straightforward study. It is good to know that surgical results are good in the longer term. Both groups – surgical and non-operative did not include all the possible or even all the best treatment modalities. Less invasive partial laminectomy with medial facetectomy is very common, and minimally invasive procedures are also common with potentially fewer complications and more rapid recovery. Please note the use of SF-36 and ODI – these are common outcome measures (ODI best validated for spine) that we also use in our own outcome database. The SPORT intent-to-treat analysis has been criticized, and we just did it in our discussion of the 2006 discectomy paper above, yet its defenders may claim that it does demonstrate a non-trivial finding: the initial choice to do surgery or not does not matter. According to this analysis, the two groups eventually had the same outcome, since all those for whom non-operative management did not work, could have surgery and improve. Does this mean that we should first choose non-operative treatment for all patients? Probably not! The as-treated analysis demonstrated that those patients who had surgery did better than those who did not. We may speculate about the meaning of this result. Had all the patients who could benefit more from surgery been operated upon, and all those who would benefit more from medical management received it – there should actually be no difference between the surgical and non-operative groups on as treated analysis. If the outcome of the operated patients as a group was better, arguably some patients in the non-operative group may have benefited more if they were in the operated group instead. In other words – the crossover to surgery was not large enough for the patients’ own good. The data shows that delaying surgery is not a good idea, although some patients can tolerate delaying surgical intervention for varying period of time.

5) Adam Pearson, MD, MS, Emily Blood, PhD, Jon Lurie, MD, MS, William Abdu, MD, MS, Dilip Sengupta, MBBS, John W. Frymoyer, MD, and James Weinstein, DO, MS. Predominant Leg Pain I Associated With Better Surgical Outcomes in Degenerative Spondylolisthesis and Spinal Stenosis. Results From the Spine Patient Outcomes Research Trial (SPORT. Spine (Phila Pa 1976). 2011 Feb 1;36(3):219-29.

Khader-Elyias: This study is a retrospective analysis of prospectively collected data. In this article the authors aimed to analyze the relation between predominant pain location and the efficacy of intervention. Patients comprising the SPORT trial with diagnosis of either spinal stenosis (SpS) or degenerative spondylolisthesis (DS) were divided into pain predominant groups based on Leg pain bothersomeness and Low back pain bothersomeness scores. Subjects with equal scores on both scales were categorized as the ‘equal group’. All three pain groups were randomized into the two arms of the trial, namely surgical and non-operative. Treatment effect was measured with multiple tools such as SF-36, Oswestry disability index (ODI), stenosis bothersomeness index and as well as the pain bothersomeness scales.

Follow up data was obtained for 2 years and shows more than 90% response in both SpS and DS cohorts. Almost half of the patients had equal leg and back pain while a quarter had more back pain than leg pain. This distribution was uniform over both disease groups. Most if not all demographic data was similar between the pain groups for both conditions and the severity of imaging findings was also similar. Despite the large crossover allowed between the treatment arms, pain location did not play a part in influencing treatment choices with equal percentage of patients in all three pain groups undergoing surgical management.

Among surgically treated patients, predominant leg pain had a better prognosis than predominant back pain. In the non-operative treatment cohort the only improvement was seen in leg pain bothersomeness scores at 1 and 2 years.

In summary, the trial supports the notion that patients who predominantly complain of leg pain rather than back pain do better than those who have mostly back pain. I heard this before from neurosurgeons who had this impression from their practice, but this study has the numbers and the long follow up to provide strong evidence. The crossover among the groups had no bearing on the analysis – the patients were not randomized or stratified by symptom location. Overall the results of this article can be used in our practices – for example to guide surgical decision and communication with patients with good confidence that they are true.

Roitberg: This article addresses an interesting question – is our impression correct that patients who have only leg pain or mostly leg pain do better with surgery (or any treatment) compared to those who have mostly back pain? The data seems to strongly support this concept. The article also re-demonstrates the best current use of the SPORT data set – this is a large set of relatively long term and detailed outcome data for common spine problems. We can ask many clinically relevant questions within this data set. Although the large crossover that occurred in all diagnostic categories makes the randomized component of the trial essentially useless, the trial is still providing useful and important information. This study also addressed the finding on imaging studies, an important component in surgical decision- making. After reading the study, we still do not know why patients with leg pain only do better, but we can advance some theories. Patients with leg pain only had baseline scores indicative of less severe symptoms in general. Their greater perceived improvement may be simply the result of being less severely affected to begin with. We know from our own experience (unpublished data) that patients tend to perceive outcome mostly based on their condition at the time of filling the follow up questionnaire, rather that on the basis of relative improvement. Thus, improvement from pain at 4/10 on the visual analog scale (VAS) to 1/10 may feel greater than that from 9/10 to 5/10. Pain perception is not linear and the VAS is therefore not linear. In all the branches of SPORT, surgical treatment was more effective than medical treatment on as-treated analysis, but it is hard to make generalizations that would apply to our practice. Intent-to-treat analysis can ask and ideally also answer the question: “which recommendation that I give now is likely to lead to a better outcome at a specified time in the future?” As-treated analysis cannot answer this question. The judgment regarding each case, given the crossover, appeared to be influenced by patient and surgeons choices and based on the current patient condition and surgeon decision. The studies support current practices – patients operated upon according to current practices mostly improved and did better than those who did not have surgery. This outcome makes me suspect that operative treatment was actually underutilized. Perfect selection of the optimal treatment for each patient should theoretically result in equal outcome in both treatment arms – operative and non-operative. The studies do not offer much information that changes the way we practice spine surgery. The key judgment questions regarding patient and treatment selection remain unanswered. For example, it makes sense that back pain associated with marked instability is more likely to improve as the result of a stabilization procedure compared to stable spinal stenosis with back pain only (rather than leg pain or classical neurogenic claudication). Studies without strict indications for surgery and uniform surgical techniques are less valid. In order to get further proof of efficacy of surgical treatment for degenerative disease of the lumbar spine, a prospective study should take equipoise into consideration, and aim to standardize both surgical and medical treatment branches. Additional reference: 1) Lachin JM. Statistical considerations in the intent-to-treat principle. Control Clin Trials. 2000 Jun;21(3):167-89.

Leave a Reply

Your email address will not be published. Required fields are marked *