Monday, September 26, 2016

Calculating Shared Savings: Administrative Formulas Versus Research-Based Evaluations

Blog_spreadsheet

Shared savings lie at the core of many recent health care payment and delivery reforms, most prominently those involving accountable care organizations (ACOs). The distribution of savings between payers and providers depends crucially on the establishment of whether savings were generated by new provider activities, and if so, the magnitude of such savings. Two approaches to savings assessment are commonly applied: 1) administrative formulas and 2) research-based evaluation.

While the two approaches are related, they use very different methods to achieve different goals. Although the main conclusions often overlap, there are times when administrative formulas and research-based evaluations can produce strikingly different results, provoking substantial confusion and consternation among stakeholders. This post looks at the advantages and disadvantages of the two approaches and discusses how they can be melded to best serve administrative, payment, and research needs.

A case study in divergent results

In the Comprehensive Primary Care (CPC) Initiative, the Centers for Medicare and Medicaid Services (CMS) used a set of administrative formulas (similar to those used in the Medicare Shared Savings Program (MSSP) and Pioneer ACO Program) to determine whether participating primary care practices earned shared savings. To obtain an independent and more comprehensive assessment of the initiative, CMS contracted with Mathematica Policy Research to conduct a thorough research-based evaluation of the program.

In four of the seven CPC regions, the findings from administrative formulas and research-based evaluation were broadly consistent in showing gross savings or losses (i.e., before accounting for care management fees). But in the remaining three regions, the findings were inconsistent. In Colorado, the administrative formulas gave credit for 1.3 percent savings, while the evaluation found neither savings nor losses. In New Jersey and New York, the administrative formulas recorded losses (i.e., expenditure increases) of 0.7 percent and 3.8 percent, respectively. But according to the Mathematica evaluation, these states achieved savings of 4 percent and 2 percent, respectively (though only New Jersey's savings were statistically significant).

Of course, the only results that matter for distributing financial rewards are those derived from the administrative formulas. Thus, it is not hard to imagine why providers in New Jersey and New York would take the lead in questioning why the two approaches produced such divergent results and which set of numbers is really "right."

Anatomy of two methods

The potential for discrepancy and controversy between the two savings assessment approaches is not unique to the CPC Initiative. Thus, it is important to understand the different mechanics and purposes of the two approaches.

The administrative formulas used in the CPC Initiative determined whether the CPC practices produced spending levels that were low enough to give the practices credit for generating savings. Savings were established if per capita spending among patients attributed to CPC practices was lower than corresponding spending in a reference population of patients in the same region who met the CPC criteria but were not attributed to a CPC practice. To isolate the effect of the initiative as clearly as possible, the spending amounts were multiplied by ratios that account for changes in casemix (e.g., disabled, non-disabled), patient risk scores, and secular trend growth in spending.

The virtue of this approach is that it is consistent and easy to replicate across CPC sites. Once the formulas are set, they can be populated fairly quickly and easily as new data become available. The drawback is that these formulas employ what are essentially "back-of-the-envelope" adjustments that are not designed to determine whether CPC activities actually caused any real changes in spending.

In contrast, the evaluation produced by Mathematica was designed to determine whether and how specific CPC initiative activities caused any observed changes in spending or other measures such as health care quality and patient and provider experiences. Mathematica's approach was much more comprehensive and elaborate, relying on a combination of claims data; survey data from practices, clinicians, staff, and patients; and qualitative information from site visits, interviews, and observations of practices and payers. For the savings analysis, Mathematica used an econometric technique known as difference-in-differences (DD) analysis, which compares spending trends among CPC-aligned patients to corresponding trends among other similar patients. The DD model also adjusts for potential confounding influences of patient (e.g., demographics, prior diagnosis, and utilization history), practice (e.g., number of clinicians), and market-level (e.g., Medicare Advantage penetration rate) variables.

The research-based evaluation approach used by Mathematica differs from the approach using administrative formulas in at least two important ways. First, the DD analysis accounts for a much larger set of potential confounding variables and does so in a way that is less rigid than a predetermined ratio. Second, additional efforts were made to ensure that the comparison practices were truly comparable to CPC practices using a statistical matching technique known as propensity score matching.

The virtue of Mathematica's evaluation approach is that it maximizes the scientific reliability and thoroughness of the information obtained. One obvious drawback is that much time is required to conduct all of the analysis. (Mathematica conducted a five-year evaluation.) Another drawback is that even though the methods can be replicated in other settings, each replication involves its own unique features.

For example, if new practices were to enter the program, a new set of matched comparisons would have to be constructed in ways that are not obvious until another layer of analysis is done to assess those practices' unique characteristics. Similarly, while DD models are now very common in health services research, each application of them requires a variety of specification checks to ensure that they are being used properly.

The broader issues

The analytic issues raised by the experience of the CPC Initiative can be broadened and generalized as shown in the table below. In summary, the typical administrative formula approach is much more rapid, narrow, and easily implemented across multiple provider groups. The typical research evaluation approach is substantially more thorough and provider group-specific, but also more resource intensive and time consuming.

Table 1. Key Differences between Administrative Formulas and Research-Based Evaluation

Approach to Savings Assessment
Issue Administrative Formulas Research-Based Evaluation
Main purpose Establish clear standards that must be met for providers to earn incentive payments. Typically, same standards applied to all provider groups in the same program. Establish chains of causality to facilitate learning and future program improvements. Understand how & why performance varies by different groups of providers.
Timeframe Short with all methods accepted upfront with little or no revision after analysis is done. Long with allowance for critique & revision of methods during the research process.
Analytic approach One singular approach agreed upon at the beginning of the contracting period. No change in methodology during the contracting period unless initial methods are found to have large and commonly recognized unanticipated flaws. Multifaceted, use of mixed quantitative & qualitative methods, examination of various modeling assumptions with sensitivity analysis. Methods may evolve post hoc -- i.e., new hypotheses are generated from initial analyses.
Nature of results One clear & final result with no allowance for statistical variation or sensitivity to assumptions. Multiple findings that must be triangulated and reported within limits of uncertainty (e.g., confidence intervals).
Accuracy vs. complexity Generally less accurate & nuanced due to imposition of rapid singular result for unambiguous trigger of incentive payments. Generally more accurate, complex, & nuanced.
Judgement in drawing conclusions Professional judgment is eliminated by a singular predetermined rule for assessing performance. Evaluators & reviewers may weigh different aspects of evidence slightly differently producing differing perspectives on final conclusions.
Performance reporting Short documents with clear results referring to previously released methodological documents. Standardized presentations repeating, & potentially clarifying, results from official documents. Long final report documents, multiple research publications, and/or multiple presentations in research and policy forums.

Toward reduced divergence and confusion

For administrative and research purposes, there is a strong and common interest in getting the "right number" when measuring savings or any other outcomes (e.g., quality improvement). Yet for administration and implementation, speed is also of paramount importance — e.g., providers cannot wait for a five-year evaluation to learn whether and how they will be paid. Finding a middle ground between the administrative and evaluation approaches should, therefore, be a high priority for CMS, health services researchers, and other stakeholders.

For the distribution of shared savings or other similar incentive payments, determining whether spending was reduced is, at least in the short run, a greater priority than determining how or why it was reduced. Thus, it would be valuable to take the core research evaluation techniques used to analyze claims data and incorporate them into the administrative calculations used to give credit for desired provider performance. It would be useful and not too difficult, for example, to replace ratio adjustments with some form of DD analysis as a standard administrative technique.

The use of matching techniques for determining comparison groups is more challenging, however, since it is very application specific. For example, some applications would match on characteristics of physician practices, while others would match on characteristics of hospitals or patients. Ultimately, finding the best middle ground analytic approach depends on tradeoffs that policymakers and stakeholders are willing to make across time, agency staffing resources, contractor resources, and willingness of stakeholders to defer key analytic decisions that affect their payments to analytic technique experts.

The fundamental differences in timing and purpose will always lead to some divergence in the methods and results of initial administrative calculations and later evaluation studies. Still, actions to reduce this divergence could go a long way in reducing confusion and controversy over provider performance as American health care moves forward with continued innovation and experimentation in payment and delivery reform.



from Health Affairs BlogHealth Affairs Blog http://ift.tt/2dwkdg2

No comments:

Post a Comment