
A recent study published in the Journal of the American Medical Informatics Association (JAMIA) demonstrates the successful use of an algorithm to aggregate and classify Electronic Health Record (EHR) visits generated from varied, site-specific operational rules into “macrovisits.” Macrovisits are a promising solution to address the heterogeneity of encounter data and enable more accurate and reliable analyses of real-world data, allowing for more definitional flexibility for projects and research questions with different needs.
Electronic health record (EHR) data has become an increasingly valuable source of information for clinical research, allowing researchers to analyze real-world data to improve patient care and advance medical knowledge. However, the heterogeneity of EHR data poses a significant challenge, limiting the full potential of leveraging this data for research purposes.
In a study published in the Journal of the American Medical Informatics Association (JAMIA), researchers demonstrated the successful use of an algorithm to aggregate and classify EHR visits generated from varied, site-specific operational rules into “macrovisits.” These macrovisits are intended for the much more focused purpose of linking encounters together to fully represent the services experienced during a discrete hospitalization. They differ from pre-existing clinical service aggregation methods such as bundles and care episodes, which are usually used to link care services for the same medical situation over various periods to support bundled payment models.
The study leveraged encounter data from 75 partner sites harmonized to a common data model (CDM) as part of the National Covid Cohort Collaborative (N3C) project. The N3C is an initiative of the National Institutes of Health Researching COVID to Enhance Recovery program. The study found that atomic inpatient encounter data were widely disparate between sites regarding the length of stay (LOS) and the number of CDM measurements per encounter. After aggregating encounters to macrovisits, LOS and measurement variance declined. A subsequent algorithm to identify hospitalized macrovisits reduced data variability further.
The authors emphasized that assessing encounter heterogeneity and methods to aggregate encounters into larger hospitalizations is essential because leveraging raw visit data misleads many analyses. For example, using raw inpatient visits to identify hospitalizations in N3C data led to an undercounting of severe cases of COVID-19. Combining visits into macrovisits post hoc instead allows for more definitional flexibility for projects and research questions with different needs.
While the rule-based algorithms used in the study lack the “apparent dynamism of a machine learning-based approach,” they are key steps in quantifying EHR data heterogeneity and creating solutions to harmonize data, the authors said. The authors also noted that there is speculation around the need for this type of work due to the perception that these issues will be resolved through harmonization pipelines of CDMs or new interoperability paradigms, such as HL7 FHIR.
“FHIR accounts for the possibility of aggregating encounters with the partOf element in the Encounter resource,” the authors wrote. “However, because this is not a required field in FHIR, it remains to be seen what proportion of FHIR-ready sites will choose to use this element and how much variation will be seen in its use.”
“Similarly, the experience of working across N3C, the largest harmonized CDM repository in the country, has demonstrated that the CDM harmonization mechanisms currently in place are not sufficient to harmonize encounter data,” they added.
The authors suggested that it would be worthwhile to consider CDM schema extensions to facilitate loading hospitalization and hospital facility data and groupings that already exist in EHR platforms, such as the “account” concept in the Epic EHR. While these concepts are unlikely to provide complete solutions to the visit issues described and would likely have heterogeneity both within and between sites, they offer a significantly more evolved and refined mechanism for dealing with hospitalizations from the EHR.