Sections
← Back to Patient Matching & Linking Toolkit
Patient Matching Example
Summary
In this section we present an example for a patient matching project and the different steps to implement a patient matching strategy.
Project profile: DISI MVP
The DISI MVP is a reference project that implements an HIE solution with a Client Registry, a Shared Health Record, an Interoperability Layer, and Data Analysis and Visualisation. The data will be captured from OHRI systems located at various health facilities. OHRI is an OpenMRS EMR that implements the HIV CBS use case.
The goal of this patient matching example is to follow the patient matching process to configure Client Registry to match patients successfully.
Profile
The profile is as follows:
- Use cases required:
- CR (HIE),
- Evaluation
- Scale: low
- Algorithm required:
Low quality data and available skills => Fellegi-Sunter & EM
Selecting patient matching tools
From the project profile identified, the tools selected for this project are:
Production tool: OpenCR
Evaluation tool: Fastlink R Notebook
Fastlink configuration
Data analysis
There's no production data for this project. The structure of the data can be found in the Minimum Data Set
As part of this data analysis we want to determine the following:
- What identifiers to use from all the captured data
- Person unique identifiers: national ID and phone number
- Other demographic identifiers: given name, family name, date of birth, gender and city
- Source system ID: OHRI patient identifier
- What pre-processing is required before all records, from different data sources, can be compared
- For this project, we are assuming that pre-processing is not required.
- What are the characteristics and quality of those identifiers and what type of errors are present when capturing the data
The Scenario 3: Low data quality configuration will be used.
Generate a dataset
The generated dataset can be downloaded here. It was created using the Data Generator Google Colab Notebook created by Jembi using the configuration below. More information on data generation can be found here.
Test and choose the optimal configuration
We used the Fastlink R Google Colab Notebook created by Jembi.
The criteria used for the test was:
- Jaro-Winkler as the string similarity algorithm
- A similarity threshold of 0.92 was used
Test in production tool
We applied the obtained configuration in OpenCR.
Next List of Technical Tools