Sections
← Back to Patient Matching & Linking Toolkit
Project Profiling and Choosing Tools
Summary
Patient matching can be implemented in different ways. Each project will have a set of requirements that make it unique. In this section, we will discuss some of the aspects to consider when capturing those requirements, what defines a patient matching project profile, and how to choose the best patient matching tool.
Disclaimer: The information presented here is to be used as a guideline, and each project will have very unique requirements that need to be taken into account and is not possible to capture all of them here. The tools and algorithms discussed were selected from our research but are not extensive.
Project profile
Before embarking on the task of implementing patient matching, we should define what is the project profile. These are the aspects that need to be considered:
- Use cases required
- Scale
- Data quality and algorithms required
Production vs Evaluation tool
Production and Evaluation environments might use the same tool or different ones. As an example, Client Registries (CR) don’t normally include evaluation capabilities, so when CRs are used for production, another evaluation tool that processes records in batch can be used to calculate precision and recall and identify the optimal configuration for the linking.
Use cases
Different projects might require one or more of the following requirements:
Client registry (HIE)
A Health Information Exchange (HIE) makes sharing health data across information systems possible. It enables data to be shared between databases, facilities, and across regions or countries.
The component in an HIE that executes patient matching is commonly called a Client Registry and these are the main workflows it must support:
- Create a patient demographic record
- Update patient demographic record
- Query patient demographic records by identifier
- Query patient demographic records by demographics
The client registry implements a patient matching engine that operates in a transactional way. For the workflows above, every time a transaction is sent to the CR, the engine can search for patients that match and create links between them when required.
Deduplication in batch
Existing data captured over a period of time from different source systems needs to be deduplicated. For this requirement, the existing records that belong to the same person need to be merged and a new database with unique records and merged data needs to be created.
This use case might also require the ongoing batch deduplication of smaller datasets later on.
Linking in batch
Records in an existing database captured over a period of time from different source systems need to be linked and form a group or cluster when they belong to the same person. Records are not usually merged, as each source system can continue managing its data.
It is common for Client Registries to use linking in batch to load data for the first time into the system.
Evaluation
As a Health System implementer, I would like to use a tool to evaluate different algorithms for my project and find the optimal configuration, so that I can set up a Client Registry with those values.
Scale
Scale options
- Low scale:
- Single hospital or hospital cluster
- Databases of up to 20,000 records
- Medium scale:
- Sub-national
- Databases of up to 200,000 records
- High scale:
- National
- Databases of millions of records
Design recommendations
A blocking strategy needs to be implemented depending on the processing power available and the number of pairs to compare. Usually, medium and high scales require blocking.
A high-performance patient matching tool (multi-server architecture) is usually needed for high scales.
Data quality and algorithms required
Low-quality data requires more complexity in the way patient matching is implemented for it to be successful. Probabilistic algorithms, especially when machine learning is applied, consistently outperform deterministic algorithms when the data quality is low.
This is a possible categorisation for existing patient matching approaches and algorithms, ordered in level of complexity:
Guideline on how to choose a patient matching tool
Project profile examples
These are two recent projects where Jembi is involved currently and patient matching was implemented.
DISI MVP
The DISI MVP is an HIE implementation where a client registry is required. The data used for this project is synthetic, and it was generated to be of low quality. This project profile can be found below.
Based on this analysis, the tools selected are the following:
- Production tool: OpenCR was selected as the client registry, with a probabilistic configuration using the Fellegi-Sunter algorithm.
- Evaluation tool: The Fastlink R Notebook was selected as the evaluation tool to obtain the optimal configuration to be used in openCR.
Ethiopia interim solution
The Ethiopia example is an interim solution implemented to deduplicate existing data in a RedCap server. In future, a client registry is expected to be implemented.
The project diagram is as follows:
Based on this analysis, the tools selected are the following:
- Production tool: A package was created for this implementation, using the existing Fastlink R package and a GUI was created to easily review the links created. Fastlink implements Fellegi-Sunter with EM.
- Evaluation tool: The same package was used for evaluation in this case.
Questions to consider
- How will the data grow over time?
- How often will data characteristics change?
- How variable are data characteristics between different regions?
- Are there any specific security and confidentiality requirements to consider?
- What are the different sources of data?
- Are there different requirements in reporting that require potentially different recall and precision measurements?
References
Next Tool Configuration Process
On This Page
- Project Profiling and Choosing Tools
- Summary
- Project profile
- Production vs Evaluation tool
- Use cases
- Client registry (HIE)
- Deduplication in batch
- Linking in batch
- Evaluation
- Scale
- Data quality and algorithms required
- Guideline on how to choose a patient matching tool
- Project profile examples
- DISI MVP
- Ethiopia interim solution
- Questions to consider
- References