If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Address reprint requests to Dr Khuri, Department of Cardiovascular Surgery, VA Boston Healthcare System, 1400 Veterans of Foreign Wars Parkway, West Roxbury, MA 02132 USA
Thank you, President Orringer, for your gracious introduction and for giving me the honor of delivering the Thomas B. Ferguson Lecture for 2002. Tom Ferguson is a giant in our field, exemplifying all that a surgeon and a human being should aspire to be. It is most fitting that we celebrate his legacy today with a discourse on issues that should matter very deeply to us as surgeons: the assessment of the quality of our care, the tools that enable us to advocate successfully for the well-being of our profession, and the dire need for us to shape the healthcare policies that affect us. Tom Ferguson would agree that there is a common thread weaving through all of these imperatives, the same thread that I hope to weave through my talk to you this morning. At the outset, I would also like to acknowledge my numerous colleagues in the Department of Veterans Affairs (VA) National Surgical Quality Improvement Program (NSQIP), including its executive committee, the more than 120 chiefs of surgical services, and the equally numerous clinical nurse reviewers. In particular I would like to acknowledge my co-chair of the NSQIP, Jennifer Daley, MD, and our lead biostatistician William Henderson, PhD.
These are hard times for surgeons. As we strive to improve the care of our patients and advance the boundaries of our respective fields, forces external to the surgical profession are setting for us standards for the care of our own patients, dictating to us the minutiae of our day-to-day management of these patients, deciding for us what is acceptable and what is unacceptable quality of care, and determining for us equitable compensation schemes. Take for example the latest of these external infringements: the standards that have been recently set by the Leapfrog Group. The Leapfrog Group is a conglomeration of purchasers of health care that includes more than 72 Fortune 500 companies. As a group, they have in excess of 24 million employees who represent $45 billion in healthcare expenditure. This group of employers has set standards for urban hospitals to meet before it would contract them for the care of their millions of employees nationwide. These standards include (1) 24 hours per day, 7 days per week coverage in intensive care units by intensivists; (2) implementation of electronic physician order entry; and (3) what was referred to as evidence-based hospital referrals based on minimum volume requirements in five major operations, including coronary artery bypass grafting, esophagectomy, and abdominal aortic aneurysmectomy. The latter standard implied that high volume of surgery was commensurate with better outcomes. I will have more to say about these standards later in this talk. However, irrespective of whether you agree with them or not, the fact that they were set not by our own professional societies but by a conglomerate of healthcare purchasers should be alarming to each one of us, inasmuch as we should be alarmed by standards and policies that are being set for us by various states, by the federal government, and even by the Joint Commission for the Accreditation of Healthcare Organizations.
The road to the national surgical quality improvement program
External standard-setting and infringement on our surgical specialties is deja vu for VA surgeons. In the mid-1980s, the quality of VA surgery came under a barrage of criticism from the media, which claimed that postoperative outcomes of surgery in VA hospitals were worse than those in the private sector. Those of us in the VA then were convinced that the discrepancy in unadjusted outcomes between the VA and the private sector did not reflect a difference in quality of care, but that it was primarily related to the fact that patients referred to the VA, as a group, were sicker and more complex than patients referred to the private sector. However, there were no reliable data to support this conviction. The criticism against VA surgery prompted the US Congress, late in 1986, to issue Public Law 99-166, which mandated that the VA report its surgical outcomes in comparison to the national average, and that these outcomes be risk-adjusted to account for differences in severity of illness between VA and non-VA populations. The response to this congressional mandate, and what evolved from it over the years, will be the subject of my talk because it exemplifies a paradigm in which surgeons took it upon themselves to establish a reliable infrastructure for the comparative assessment of the quality of care of their patients, and for addressing advocacy and healthcare policy issues that mattered to them and their profession.
The VA is the largest single provider healthcare delivery system in the United States. The Veterans Health Administration (VHA) comprises 128 medical centers that perform major surgery, of which 42 perform open heart surgery. Cardiac surgery in the VA provided the impetus and the road map for the response to the congressional mandate. Two years before the congressional mandate was issued, two visionary members of the VA Cardiac Surgery Consultants Committee, Fred Grover, MD, and Karl Hammermeister, MD, started a prospective data collection in all VA cardiac surgical centers and developed a novel system for risk-adjustment of outcomes in cardiac surgery. However, this effort was hampered by inadequate funding and resulted in only 60% to 70% completion of data collection. Thus, when in 1988 a group of us were consulted by the VA to advise it on how best to respond to the congressional mandate, we pointed out that at that time, there were no known acceptable national averages for outcomes of various surgical specialties, and that, except for a limited experience in cardiac surgery, there were no known credible models for risk adjustment of surgical outcomes. We argued, however, that the VA was in a unique position to lead the nation in developing national norms and risk-adjustment models because of its centralized administrative structure, its uniform information technology infrastructure, and its experience with risk adjustment in cardiac surgery. Our committee succeeded in convincing the VA to initiate, in 1991, the National VA Surgical Risk Study with the goal of developing and validating risk-adjustment models for the prediction of surgical outcome, and for the comparative assessment of the quality of major surgical care among the VA surgical centers.
], in his classic treatise on quality of care, defined three dimensions of health care that can be used in the assessment of quality: structure, which describes the attributes of how healthcare systems are organized; process, which describes what we do to and for our patients; and outcome, which describes the changes in patients’ health status that may be attributed to the healthcare process. Surgery has the advantage of being ideally suited for the use of outcome measures in the assessment of quality, because surgical care revolves primarily around a predictable single event (the operation), which, in most cases, has an expected and a measurable outcome. Hence, the rationale underlying the National VA Surgical Risk Study was based on what Iezzoni [
], a leading expert on risk adjustment of outcomes, terms the “algebra of effectiveness”—a conceptual framework in which outcomes of health care are determined by the sum of three major factors: patient risk factors before surgery, the effectiveness (or quality) of the patient’s care, and random variation. If one accounts for the severity of the patient’s illness by proper risk adjustment, and for random events by proper statistical methods, one can then equate outcome to effectiveness of care. Hence, to enable the use of outcome as a measure of quality of surgical care, the National VA Surgical Risk Study had to: (1) develop a reliable clinical database of the patients’ relevant preoperative risk factors and postoperative outcomes, and (2) develop analytic tools for proper risk adjustment and to account for random events (Fig 1).
Fig 1The National Surgical Quality Improvement Program is based on the assumption that outcomes in surgery are determined by the sum of the patients’ risk factors, effectiveness or quality of care, and random variation. By providing a database to account for patients’ risk factors and outcomes, and analytic tools for risk adjustment and to account for random variation, the National Surgical Quality Improvement Program is able to equate the measurement of outcome with the measurement of quality of care. (Adapted from Khuri SF, Daley J, Henderson WG. The comparative assessment and improvement of quality of surgical care in the Department of Veterans Affairs. Arch Surg 2002;137:20–7, by permission of the American Medical Association, 2002.)
Initiated in 44 VA medical centers, the hallmark of the VA National Surgical Risk Study was a trained dedicated clinical nurse reviewer at each facility who, working closely with the chief of surgery, prospectively collected risk and 30-day outcome variables on all major cases, and electronically transmitted non-cardiac surgery data to a data coordinating center at the VA in Hines, IL, and the cardiac surgery data to another coordinating center at the VA in Denver, CO. Of note is that the completion rate of the cardiac surgery data collection rose to 99.8% after the risk study was initiated—a marked increase from the 60% to 70% completion rate that had prevailed before the assignment of dedicated nurses to collect these data. Risk-adjustment models were developed in the two data coordinating centers and fed to the study executive committee for periodic review and analysis. From its inception, the VA National Surgical Risk Study was guided by a panel of experts from outside the VA that met with the executive committee on a regular basis. The panel was chaired by Barbara McNeil, MD, head of the Department of Health Policy Research at Harvard Medical School, and included renowned experts in surgery, health policy research, biostatistics, and epidemiology. The surgeons on the panel were John Mannick, MD, Brad Aust, MD, and Paul Ebert, MD (then director of the American College of Surgeons).
The outcome variables included death and complications within the first 30 days postoperatively. The complications were prospectively categorized into 22 groups. After developing and comparing 11 morbidity scoring schemes, the final statistical models were developed with a dichotomous morbidity score based on whether or not a patient had one or more complications. To account for variation in the complexity of the operations among institutions and subspecialties, panels of 6 expert surgeons in each specialty developed a complexity score for each of more than 3,000 CPT codes contained in the database. Interrater reliability was assessed by two traveling coordinators who site visited each medical center and re-abstracted a sample of cases from each site. The resultant kappa statistics indicated good interrater reliability for all types of variables collected. Complete data were collected on 103,342 operations between October 1, 1991, and December 31, 1993. The cardiac operations were analyzed and reported separately. For non-cardiac surgery, nine predictive models of 30-day mortality were constructed [
Risk adjustment of the postoperative mortality rate for the comparative assessment of the quality of surgical care. Results of the National VA Surgical Risk Study.
], one for all operations and one for each of eight major surgical subspecialties. High C-indices, which ranged from 0.79 to 0.91, indicated excellent predictability for all these models. (A C-index of 1.0 indicates perfect predictability and C-index of 0.5 indicates no predictability.) Similar models were constructed for 30-day morbidity [
Risk adjustment of the postoperative morbidity rate for the comparative assessment of the quality of surgical care. Results of the National VA Surgical Risk Study.
]. Preoperative serum albumin was by far the most important predictor of 30-day mortality and morbidity in the all-operations model. By generating a beta coefficient for each of the predictive variables in these models, the logit equation was used to determine an expected mortality or morbidity rate for any given population of patients. Knowing the actual or observed mortality and morbidity rate for that patient population, one could generate an O/E (observed to expected) ratio for each of these outcomes. Figure 2 shows the overall mortality OE ratio for each of the 44 participating medical centers during the 27-month period of the study. The hospitals are arranged in the order of increasing OE ratio, which ranged from 0.49 to 1.53. Asterisks indicate the statistically significant outlier hospitals at the 90% confidence level. The high-outlier hospitals on the right side had an observed mortality rate that was significantly higher than that accounted for by the severity of illness of their respective patient populations. The low-outlier hospitals on the left side had an observed mortality rate that was significantly lower than that accounted for by the severity of illness of their respective patient populations. The implications in this figure were that high-outlier hospitals provided inferior quality of care and low-outlier hospitals provided superior quality of care. One of the major accomplishments of the National VA Surgical Risk Study was that it validated its findings by conducting a study led by Jennifer Daley, MD, which included site visits by teams of surgeons, health services researchers, and nurses, and demonstrated that indeed hospitals with significantly low mortality and morbidity O/E ratios had superior structures and processes of care, whereas hospitals with significantly high mortality and morbidity O/E ratios had inferior structures and processes of care.
Fig 2Risk-adjusted observed-to-expected (O/E) ratios for 30-day mortality for 44 Veterans Affairs medical centers. ∗Indicates a hospital in which the risk-adjusted observed-to-expected ratio is an outlier at the 90% confidence level.
The successful completion of the VA National Risk Study provided the VA and the surgical world, for the first time, with a valid outcome-based quality assessment tool for non-cardiac surgery, and verified the utility of the cardiac surgery models. Accordingly, the NSQIP was instituted in 1994; it expanded the methodology applied in the risk study to all 133 VA medical centers that performed major surgery then, and established an infrastructure the primary purpose of which was to continually assess and improve the quality of surgical care in the VA. Eighty-eight additional clinical nurse reviewers were recruited to ensure the continued reliability of the data collected from all of these centers. Today, the VA NSQIP database accrues approximately 105,000 major operations annually. As in the risk study, a total of 78 demographic, clinical, and outcome variables are collected on each major operation, in addition to 22 preoperative and postoperative laboratory variables. As of September 30, 2001, the NSQIP clean-file database comprised complete data on more than 835,000 operations in nine surgical specialties. General surgery made up 28%, followed by orthopedics, 19%, and urology, 14%. Cardiac cases were 9%, and thoracic non-cardiac, 4%. Figure 3 exemplifies the importance of risk adjustment of outcomes in the assessment of the quality of care at specific institutions. The hospitals shown were ranked according to their unadjusted mortality rate on the left side and their O/E ratio, i.e., the risk-adjusted mortality rate, on the right side. Note how risk adjustment changed the rank of the majority of these hospitals. One hospital, for example, changed from being the seventh from the top to being the sixth from the bottom. We like to refer to this figure as the “Railroad Track” figure. If risk adjustment of mortality did not have an effect, the figure would look like a railroad track because the lines between the two sets of ranks would be parallel. Clearly the figure is far from looking like a railroad track! Annual reviews of the outlier status of participating hospitals have demonstrated that an error rate of 60% or more can be committed in designating a hospital as an outlier when hospitals are compared by the unadjusted mortality rate instead of the O/E ratio [
The Department of Veterans Affairs NSQIP the first national validated, outcome-based, risk-adjusted, and peer-controlled program for the measurement and enhancement of the quality of surgical care.
Fig 3Changes in hospital ranks after risk adjustment for the 30-day mortality rate. The left-hand column shows the rank of the 44 hospitals by unadjusted postoperative mortality rate, with the hospital listed in the first position having the lowest rate. The right-hand column shows the rank of the same hospitals based on their risk-adjusted observed-to-expected (O/E) ratio for postoperative mortality. Each hospital is connected with a line that demonstrates the change in rank order after risk adjustment. (Adapted from Khuri and associates [3] by permission of the American College of Surgeons.)
The backbone of the NSQIP, and its most important asset, is the feedback that it provides on a regular basis to the providers in the field. This feedback has numerous components to it, the most important of which are detailed periodic reports that provide the surgeons and managers with hospital-specific information on workload, risk-adjusted outcomes, patient risk profiles, and length of stay—all in comparison to national averages and peer group data. These data are used by the surgeons in the field to devise specific quality improvement protocols such as the one described by Neumayer and colleagues in 2000 [
]. Feedback from the executive committee is also provided in the form of facility-specific recommendations and different levels of concerns about the high outliers, generated during a 2-day meeting in which a detailed review of the risk-adjusted outcomes in all the hospitals is performed by the executive committee. Instruments for conducting self-evaluation, internal reviews, and site visits have been developed and made available as part of the feedback to all participating surgical services. The NSQIP also conducts, on a regular basis, consultatory structured site visits that are specifically requested by surgeons and managers in the field. Most importantly, the NSQIP regularly disseminates the best practices, which are identified in low-outlier facilities and in facilities that have demonstrated significant improvements over time. These best practices constitute an important part of the chief of surgery’s annual report. Figure 4 illustrates what the NSQIP is all about: institutional clinical data are reliably collected and transmitted centrally, where they are processed and fed back to the providers so as to effect continuous quality improvement. The NSQIP does not use risk-adjusted outcomes as an audit to identify the “bad apple.” It uses risk-adjusted outcomes as a means to assess and improve quality of surgical care. During the course of the implementation of the risk study and the NSQIP, the overall 30-day mortality rate for all surgery in the VA declined by 27%, and the overall 30-day morbidity rate declined by 45% (Fig 5). Concomitantly, there was a 43% decline in the average postoperative length of stay. The improvement in surgical outcomes has occurred at the same time that the volume of major surgery, the average preoperative risk of the patients undergoing major surgery, and the average complexity of the operations have not changed.
Fig 4The National Surgical Quality Improvement Program (NSQIP) collects data from each of its participating sites, ascertains their cleanliness and reliability, and processes them into comparative risk-adjusted outcomes. These outcomes, along with other information and tools, are continuously fed back to the participating centers for the primary purpose of achieving quality improvement (QI). (Adapted from Khuri SF, Daley J, Henderson WG. The comparative assessment and improvement of quality of surgical care in the Department of Veterans Affairs. Arch Surg 2002;137:20–7, by permission of the American Medical Association.)
Fig 5The 30-day postoperative mortality (left) and 30-day postoperative morbidity (right) for all major operations performed in the Department of Veterans Affairs hospitals throughout the duration of the National Surgical Quality Improvement Program data collection process. A 27% decrease in mortality and a 45% decrease in morbidity were observed in the face of no change in the patients’ risk profiles. (FY = fiscal year; Adapted from Khuri SF, Daley J, Henderson WG. The comparative assessment and improvement of quality of surgical care in the Department of Veterans Affairs. Arch Surg 2002;137:20–7, by permission of the American Medical Association.)
The national surgical quality improvement program as a vehicle for quality improvement, advocacy, and healthcare policy
Thomas Garthwaite, MD, Under Secretary for Health in the Department of Veterans Affairs and the highest ranking person in the Veterans Health Administration, has repeatedly stated, “In the past, we needed to attend to complaints about surgery more than any other discipline. Since the NSQIP was established, we have not had to spend time on such complaints, because we now have reliable data with which we can properly address these issues.”
The NSQIP has established a peer-review mechanism allowing VA researchers to interrogate and analyze its rich database, and to address important questions related to medical care, healthcare quality, advocacy, and healthcare policy. To date, the program has contributed to the literature 33 peer-reviewed journal articles and five book chapters. It has made 55 presentations at national meetings and is currently conducting 62 research studies. The contributions to the literature and to VA health policy have been wide-ranging. For example, considering that health policy research in the VA was heavily dependent on data obtained from its administrative database, and considering that risk adjustment is critical to all research that uses surgical outcomes as end points, it was important for the NSQIP to determine whether or not the information contained in the VA administrative database was adequate for proper risk adjustment of outcomes after surgery. To this effect, Best and associates [
Identifying patient preoperative risk factors and postoperative adverse events in administrative data results from the National VA Surgical Quality Improvement Program.
] compared patient preoperative variables and 30-day mortality and morbidity in the NSQIP database, which were designated as the standard criterion, to the corresponding variables in the VA’s Patient Treatment File (PTF), which were designated as the test criterion. Both the sensitivity and the positive predictive value of the PTF in depicting risk factors and postoperative outcomes were calculated. To justify replacing the NSQIP with the PTF, each of these measurements should have a value equal or exceeding 0.9. Despite the relative clinical robustness of the VA PTF, these values were nowhere near 0.9! The average sensitivity and positive predictive value of the PTF in depicting the preoperative risk factors were 0.28 and 0.41, respectively. Worse still, the average sensitivity and positive predictive value of the PTF in depicting the postoperative outcomes were 0.17 and 0.18, respectively. This study was instrumental in silencing critics of the NSQIP within the VA who had claimed that data collection in the NSQIP was too expensive and superfluous, and that the same data could be obtained from the PTF.
Another major NSQIP study addressed a debate that raged in 1997 regarding the regionalization of surgical referrals to high-volume centers. The VA by then had reorganized into 22 new autonomous networks, and a new cost allocation system had been implemented that favored primary over tertiary care. This prompted a number of network directors to recommend closing small-volume surgical services, arguing that better quality of surgical care would prevail in larger volume hospitals. To address this, the NSQIP examined the relation of surgical volume to outcome in eight common operations: abdominal aortic aneurysm repair, infrainguinal vascular reconstruction, carotid endarterectomy, lung resection, open and laparoscopic cholecystectomy, colectomy, and total hip arthroplasty [
]. Four types of statistical analyses showed no relationship between the 30-day mortality O/E ratio and procedure volume in any of the eight operations examined. Automatic interaction detection analysis also failed to identify a volume threshold below which risk-adjusted 30-day mortality was adversely affected in any of the eight operations. In this study, for example, the hospital with the highest volume of colectomies, 52 cases per year, was one of the three high-outlier hospitals, whereas a hospital with fewer than one third of these cases was the lowest outlier hospital, i.e., the best performer in the whole group! In the face of these compelling data, managers in the VA could no longer invoke quality improvement as a justification for the closure of small-volume surgical centers. More importantly, these managers have accepted NSQIP risk-adjusted outcomes as the measures for quality of surgical care and as the basis for major decision making regarding surgery.
The debate about whether increased volume improves the outcomes of surgery or not is a raging debate that is unlikely to be settled soon. Almost all major studies that have found a direct relationship between volume and outcomes in non-cardiac surgery have been based on administrative and claims databases. The NSQIP studies, which have repeatedly failed to show such a relationship, raise serious questions about the validity of the risk adjustment in studies that use administrative and claims databases. More importantly, the NSQIP studies have repeatedly underscored the fact that quality is in systems of care, and that referral centers with high volumes of surgery may exhibit good quality of care not because of high volumes per se but because these large referral centers generally have good systems of care. By providing a direct outcome-based measure of quality of surgical care, the NSQIP has eliminated the need in the VA to use volume of surgery as a proxy measure of quality, as the Leapfrog Group did in setting its standards for the private sector.
A vision for the future of the national surgical quality improvement program
Within its overall strategic plan, the NSQIP is still in its infancy and will require several years to mature into a fully comprehensive system for the comparative assessment and improvement of the quality of surgical care. It uses only one of the three quality-related dimensions of health care: outcome, and within that, only 30-day morbidity and mortality (Fig 6A). The strategic plan of the NSQIP (Fig 6B) calls for the incorporation of additional instruments and tools for the measurement of postoperative functional status, quality of life, and patient satisfaction. The ability of the NSQIP to measure reliably risk-adjusted outcomes gives it a unique opportunity to identify elements within the other two dimensions of health care, process and structure, that can be used as meaningful measures of quality of surgical care. An article in the Wall Street Journal, published in December 2001, underscored the failure of the Joint Commission for the Accreditation of Healthcare Organizations’ accreditation process in depicting quality of care, because the commission relied almost exclusively on arbitrary processes and structures that had not been shown to relate meaningfully to outcomes. Only processes and structures that have been demonstrated to impact on surgical outcome should be used as measures of quality of care. Cost also cannot be excluded from any quality measurement system. Here again, by relating cost to risk-adjusted outcomes, the NSQIP should be able to provide, maybe for the first time, meaningful measures of cost-effectiveness, thus completing, hopefully in the near future, the big picture of comprehensive quality improvement shown in Figure 6B.
Fig 6(A) Today, the National Surgical Quality Improvement Program is achieving quality improvement (QI) through measurement and feedback to providers of risk-adjusted mortality and morbidity. (B) The vision for tomorrow is to achieve more comprehensive quality assessment and improvement by incorporating additional measures of outcome, such as quality of life (Q of Life), functional status, and patient (Pt.) satisfaction; measures of outcome-related structures and processes of care; and measures of cost-efficiency that are defined in terms of the relationship between cost and outcome of care. (Adapted from Khuri SF, Daley J, Henderson WG. The comparative assessment and improvement of quality of surgical care in the Department of Veterans Affairs. Arch Surg 2002;137:20–7, by permission of the American Medical Association.)
Should and can there be a national surgical quality improvement program outside the department of veterans affairs?
Should and can there be an NSQIP outside the VA? The surgical community today has as many reasons to set up an NSQIP-like program as did the VA surgeons 15 years ago. Outcomes without risk adjustment continue to be used (and abused) by the lay press as ipso facto measures of quality of surgical care. The NSQIP has shown that the use of unadjusted outcomes can lead to an error in judging the quality of a specific hospital in 60% of the cases. Consumer groups are now invoking the Freedom of Information Act and publishing on the Internet grades of hospitals and providers, based on partial administrative and claims databases that preclude adequate risk adjustment. After visiting www.healthgrades.com, who would like to be treated at a less than 5-star hospital?! The US News and World Report provides an annual listing of the “best” hospitals that has the trust of only those of us whose hospitals make it to the list! Those who do not make it should be reassured that risk-adjusted outcomes are not among the performance criteria used in this report. The alarming trend among various states to rate individual surgeons on the basis of their outcomes, which started in cardiac surgery, is now rapidly expanding to other fields, and all types of report cards are being proposed to grade surgeons, mostly with little or no input from the surgical community. Surgical specialty boards are having a hard time with the mandate issued by the Accreditation Council for Graduate Medical Education and the American Board of Medical Specialties that each specialty board should develop measures of provider competence to be incorporated into the certification and recertification processes. In the VA, we have discouraged the generation of surgeon-specific risk-adjusted outcomes mainly because of two serious pitfalls: first, the average surgeon does not perform enough operations annually to provide a statistically meaningful sample size for the generation of stable O/E ratios. Second, and probably more importantly, one cannot separate the performance of a provider from that of his or her institution, because quality is highly dependent on institutional systems. The most competent surgeon will have poor outcomes in inferior systems of care. For this and for many other reasons, outcome-based individual report cards have very little value in quality improvement. They will harm NSQIP-like efforts because they alienate and disfranchise the surgeons in the field. If the Accreditation Council for Graduate Medical Education mandate should result in the development of outcome-based report cards for surgeons, it is imperative that the quality of both the surgeon and his or her institution be measured interdependently—another reason for setting up an NSQIP-like program nationally.
Most alarmingly, in the absence of an authoritative professional surgical organization that sets national standards, industry and managed care are setting our standards by default—standards that may be harmful to the surgical community. A recent study by Birkmeyer and associates attempted to justify the Leapfrog Group’s volume standards by calculating, from published studies, the numbers of lives saved had patients in these studies been referred to high-volume hospitals only. Seventy-five percent of the lives saved were calculated on the basis of two studies that had shown a direct relationship between volume and outcome, one in abdominal aortic aneurysmectomy in the VA, and one in coronary artery bypass grafting in the state of New York. The VA study used by Birkmeyer and coworkers in their analysis had used the VA’s administrative database for risk adjustment. When the NSQIP data on the same patients were analyzed a few years later, no relationship between volume and outcome could be elicited [
]. Likewise, the New York state study used by Birkmeyer and colleagues in their analysis was also supplanted a few years later by another study from the same group that now also showed no relationship between volume and outcome of coronary artery bypass grafting [
]. What is most alarming about this exercise is not that the volume standards set for the surgical community by the Leapfrog Group were ill-grounded scientifically, but that we now depend on Microsoft and General Motors to set our standards of care—a point which I underscored in an editorial that accompanied the article by Birkmeyer and associates. Surgeons, and only surgeons, need to set standards of surgical care, and for that they will need an NSQIP.
Many have said that the NSQIP would not work in the private sector where there would be no central authority to mandate it, where it would be expensive to hire dedicated nurses for data collection, and where the patient population and the predictive models would be different. It was not by any means a central mandate that brought about the NSQIP. In fact, the NSQIP in its first 4 years was fiercely fought by some senior managers in the VA who almost succeeded in killing it. It was the chiefs of surgery in the field who willed it and made it happen because they had realized, as a group, that they would not be able to advocate for themselves nor withstand the onslaught of byzantine policies imposed on them without proper data—a realization which Fred Grover tells me has become a driving force for The Society of Thoracic Surgeons database as well. Of course there were many chiefs of surgery in the VA who were initially skeptical and viewed this as “Big Brother” breathing down their throat. The argument that won these skeptics was an argument that is very apropos to the surgical community as a whole today: if we, ourselves, do not do this, somebody else will do it for us, and you can be sure they will not do it better. If the will is present in the private sector, we have enough professional organizations in surgery that can provide the necessary mandate.
Is the NSQIP too expensive to be applied in the private sector? It is not. The total annual expenditure of the program, including the salaries of the nurses in the 128 participating hospitals, is less than $5 million, averaging $38 per major operation assessed—nearly the cost of two 7-0 Prolene sutures! More important than cost in a program of this nature is perceived value. Providers will partake in an NSQIP only if they find value in it. Chiefs of surgery in the VA have found value in obtaining reliable comparative data that would characterize the quality of their performance, and enable them to evaluate and improve the systems of care at their local facilities. There is value in the NSQIP because, unlike in industry and manufacturing where quality generally costs more in health care, quality costs less because it prevents costly morbidity. One of the interesting current studies of the NSQIP is an assessment of the savings to the VA that have been realized by a 47% decrease in morbidity during the 10 years since the inception of the NSQIP. We estimate it to be in the billions of dollars—certainly much more than what the VA has already spent on the NSQIP. A program that is designed to and can effectively improve quality of surgical care cannot be too expensive.
To answer the question of whether the NSQIP predictive models were applicable to non-VA populations, an NSQIP Private Sector Initiative (PSI) was started more than 2 years ago, involving three non-VA institutions: the departments of surgery at the University of Michigan in Ann Arbor, the University of Kentucky in Lexington, and Emory University in Atlanta. A dedicated nurse was trained at each of these facilities, and new software was developed that allowed the nurses to collect and transmit the data through the Internet, the data collection instrument itself being identical to the one used in the VA, but limited to general and vascular surgery. After a year and a half of data collection, predictive models were built based on the VA data alone, the PSI alone, all the data combined, and on the VA top 10 predictors only. All these models had high C-indices indicating excellent predictability. When the VA top 10 predictors model, the simplest, was applied to the PSI data, it yielded a C-index of 0.95, as high as that of the model based on the PSI data alone. Considering that 1.0 is perfect predictability, these results clearly indicated that the VA models were very applicable to the patient populations of these three non-VA medical centers. Encouraged by these results, the NSQIP partnered with the American College of Surgeons, and together we recently secured a $5.2 million grant from the Agency for Healthcare Research and Quality to investigate in the 128 VA surgical services and 10 additional private sector institutions the efficacy of the NSQIP as a reporting system to improve patient safety in surgery. One of the main objectives of this study is to explore further the applicability of the NSQIP to the private sector, in the hope of opening up the NSQIP in the future to he surgical community at large.
The NSQIP will be applicable to the private sector, as a comprehensive tool for the assessment and improvement of the quality of care in all of surgery, if, and only if, it retains in the private sector its most essential ingredient: the trust of the surgeon in the field and his or her pride in its accomplishments. When a system like the NSQIP gains the trust of the surgeons as a means by which they can reliably identify and study the strengths and weaknesses in the quality of the care they deliver, they are much more likely to participate in it and much less likely to game it than a system bent on providing an audit primarily to identify high outliers and poor performers. The surgical community does not need another audit system. It needs a trustworthy outcome-based data-driven quality improvement program.
Conclusion: the common thread
In conclusion, I have presented to you this morning the NSQIP as the first national, validated, outcome-based, risk-adjusted, and peer-controlled state-of-the-art program for the measurement and enhancement of the quality of surgical care. I have shown you how this system evolved from the need of VA surgeons in the mid-1980s to advocate for themselves against disenfranchisement and adverse policy. I have drawn a parallelism between what VA surgeons faced then and what the surgical community at large is facing today—while trying to underscore the need for the surgical community as a whole to partake in a valid, truly national system for measuring and enhancing the quality of surgical care—by surgeons, for surgeons. Only with such a system will we be empowered to shape our destiny and the fate of our profession.
Finally fellow colleagues, we will never be able to measure reliably the quality of surgical care, or advocate effectively for our profession and against adverse healthcare policies without the common denominator—the thread that weaves through them all: reliable data. Last year, Woodrow Myers, MD, then Director of Health Care Management of the Ford Motor Company, addressed members of the American Board of Surgery during a retreat dedicated to a discussion of the measurement of surgeon competence. After presenting the data analyses that formed the basis for the referral of Ford employees to specific healthcare providers, he must have guessed what most of us in the audience were thinking, and said, “Some of you are saying to yourselves, ‘but these are flawed data!’ Yes, they are, in part, flawed data, but we will continue to use flawed data until you, the surgical community, provide us with better data.” It never ceases to amaze me the extent to which we, as a profession, are willing to go to ensure the reliability and accuracy of the data we submit to basic peer-reviewed journals every day, yet we are mostly oblivious to the quality of the data that actually determine our livelihood and the very nature of our profession. It is not enough to view data as “medicine’s new weapon,” as Business Week put it. Only reliable data are medicine’s new weapon. Unreliable data are a weapon that has and continues to hurt us immeasurably as surgeons and healthcare professionals.
Thank you, President Orringer and fellow colleagues, for giving me the splendid honor of delivering the 2002 Thomas B. Ferguson Lecture.
Risk adjustment of the postoperative mortality rate for the comparative assessment of the quality of surgical care. Results of the National VA Surgical Risk Study.
Risk adjustment of the postoperative morbidity rate for the comparative assessment of the quality of surgical care. Results of the National VA Surgical Risk Study.