Probabilistic data linkage: a case study of comparative effectiveness in COPD
Abstract
Background: In this era of comparative effectiveness research, new, advanced techniques are being investigated by the research community to overcome the limitations of existing data sources. We describe the approach of probabilistic data linkage as a means to address this critical issue.
Methods: We employed a historical retrospective cohort design. Patients aged 40 and older with a principal or secondary diagnosis of COPD (ICD-9-CM codes 491.xx, 492.xx, and 496) and at least 3 years of continuous enrollment between January 1, 2004 and April 30, 2009 were selected from two US-based commercial administrative claims databases. The index date was designated as the date of the first claim (defined by a 12-month wash-out pre-index period) for the study drugs, for illustration purposes referred to as Treatment 1 or Treatment 2. The primary effectiveness measure was risk of any COPD-related exacerbation observed in the 12-month post-index period, with baseline characteristics being identified in the 12-month pre-index period.
Results: The percentage of the study sample receiving Treatment 1 at index who had an exacerbation was 39.3% for Database A and 39.7% for Database B; for Treatment 2, the percentages were 46.3% and 47.1%, respectively. The event rate of hospitalizations in each database sample was nearly identical as were the odds ratio and corresponding confidence intervals from the adjusted logistic regression models (OR – Database A: 0.72, Database B: 0.74, Database A with imputed outcomes: 0.72). Conclusions: The probabilistic linkage demonstrated that patients from different databases matched on similar pre-index characteristics may demonstrate similar outcomes in the post-index period.