In Depth:

Master Person Index

Deep Data Insight Master Person Index provides fast, effective and secure solution for any sectors, namely, few of them as Health, Insurance, Engineering and Education. In Health Science, our MPI product would be useful for patient coordination, Health management and Health analytics. Also, our MPI product is customizable and standards based solution for any sector.

This mainly covers Deep Data Insight MPI module’s functionality. If more details are required, it is recommended to refer Deep Data Insight Master Patient Index for Health Science white paper.

MPI Module objective

Main objective of MPI module is to maintain and update a central index for person records regardless where the record originated from (person records are coming from different data sources). If a person is recorded in two different data sources with same or closely related information, the task of the MPI is to create a single record for this person and maintain a reference/mapping between this single unique record and the respective records in the different data sources.

Management and Control

Our MPI module operates on four main phases, namely, configuration phase, training phase, ambiguity resolution phase and matching phase. All of these are controlled through API calls.

Input, Output and Utilities

The module is able to work with database tables or structured data files of any format and this information will be set at the configuration phase. The default of the system is to work with database tables. The module will interact with three tables, namely, a source table with person records, MPI table and MPI Reference table. These tables work as inputs to the module and at the end of the training and matching phases, some or all three of these tables are updated.

Table structures

Regardless whether the inputs are database tables or structured data files, they follow a defined structure.

Table 1: Source Table (Data to be Matched)

Source ID	Company ID	First Name	Last Name	Gender	Date of Birth	Address	…	MPI ID	Probability
1111	c1	Jacky	Reevs	m	20/05/1985	…	…	null	null
2030	c1	Jane	Parker	f	6/2/1990	…	…	null	null
7467	c2	Aron	Davis	m	10/10/1980	…	…	null	null
7878	c2	Jackie	Reeves	m	20/05/1985	…	…	null	null
2222	c3	Jack	Rives	m	12/5/1985	…	…	null	null

A pre-processing step is carried out in order to bring the source table to a format similar to above if the original source data is in a different format.

On which fields should the matching be done is set during the configuration phase

Table 2: MPI Table

MPI ID	First Name	Last Name	Gender	Date of Birth	Address	…
1	Peter	Jackson	m	10/5/1975	…	…
2	Gabrial	Smith	f	16/3/1981	…	…
3	Aron	Davis	m	10/10/1980	…	…
4	Jackie	Reeves	m	20/05/1985	…	…

Table 2: MPI Reference Table

Reference ID	MPI ID	Probability	Company ID	Alternative ID
c3-4000	1	1	c3	null
c1-2292	2	0.93	c1	null
c2-7467	3	1	c2	null
c2-7878	4	1	c2	null

According to Table 2: MPI table and Table 3: MPI Reference table, following observations can be made:

A person from Company C3 with the id 4000 has the same information as the MPI record with the MPI id 1.
A person from the company C1 with the id 2292 has a 93% match with the MPI record with the id 2.

According to Table 1: Source table, following observations can be made.

‘Jacky Reevs’ from C1 and ‘Jackie Reeves’ from C2 are the same person and should have two separate entries in the MPI reference table linking to the MPI record with the id 4 but only ‘Jackie Reeves’ from C2 has a reference entry.
‘Jack Rives’ from C3 could be the same person as the above but name and part of date of birth are causing doubt.
‘Aron Davis’ from C2 already has a MPI and MPI reference entry.
‘Jane Parker’ from C2 is a completely new person and doesn’t have a MPI or a MPI reference entry.

Configuration phase

Configuration phase is not a task to be done frequently. Unless the table or the data structures changes, this is mostly a one-time task.

Ambiguity Resolution phase

On the initial run, this phase will not be executed since both MPI and MPI reference tables are empty at the beginning. During the matching process, the MPI module finds the closest matching record from the MPI table for each record in the source table.

If a match is found with a score higher than the pre-configured upper threshold value (say 0.85) then the module automatically link the source table record to the matched entry.

If the closest match found is with a score less than the pre-configured lower threshold value (say 0.45) then the module automatically decides that the source record does not have an entry in MPI and thus adds a record to MPI and MPI reference tables.

If the closest match found is with a score which lies in between the above two thresholds then the module will not decide definitively whether there is a match or not, but instead will add a record to both MPI and MPI Reference table while retaining information about the closest match as well as setting a flag indicating that this entry needs manual ambiguity resolution.

During manual ambiguity resolution phase, someone will look at the entries with the flag set, and decide either to keep them as new records to the MPI or to remove the new entry and have a reference to the closest MPI record.

Training phase

After each matching iteration, new entries will be added to MPI Reference table or both MPI Reference and the MPI table. A manual ambiguity resolution phase is conducted if there are any ambiguous records added during the matching phase. This will be done offline on a separate day from the day matching process carried out. After ambiguity resolution, the module is trained with all the entries in the MPI table. This will also be done offline before the next day on which the matching process is expected to run.

At this phase, module will use the entries in the MPI table to train itself and will have all the calculations and matching criteria ready for the next matching phase.

Table 4: Source Table

Source ID	Company ID	First Name	Last Name	Gender	Date of Birth	Address	…	MPI ID	Probability
1111	c1	Jacky	Reevs	m	20/05/1985	…	…	4	0.86
2030	c1	Jane	Parker	f	6/2/1990	…	…	5	1
7467	c2	Aron	Davis	m	10/10/1980	…	…	3	1
7878	c2	Jackie	Reeves	m	20/05/1985	…	…	4	1
2222	c3	Jack	Rives	m	12/5/1985	…	…	4	0.53

Table 5: MPI

MPI ID	First Name	Last Name	Gender	Date of Birth	Address	…
1	Peter	Jackson	m	10/5/1975	…	…
2	Gabrial	Smith	f	16/3/1981	…	…
3	Aron	Davis	m	10/10/1980	…	…
4	Jackie	Reeves	m	20/05/1985	…	…
5	Jane	Parker	f	6/2/1990	…	…
6	Jack	Reives	m	16/05/1985	…	…

Table 6: MPI Reference

Reference ID	MPI ID	Probability	Company ID	Need Inspection	Alternative ID
c3-4000	1	1	c3	0	null
c1-2292	2	0.93	c1	0	null
c2-7467	3	1	c2	0	null
c2-7878	4	1	c2	0	null
c1-1111	4	0.86	c1	0	null
c1-2030	5	1	c1	0	null
c3-2222	4	0.53	c3	1	6

According to results of Table 4, 5 and 6, only one record is added with the need for ambiguity resolution. During ambiguity resolution phase, the user can decide whether to keep MPI entry with id 6 as a new record, in which case the MPI considers ‘Jack Rives’ as a different person, or the user can decide to remove MPI entry with id 6 and keep the link to MPI entry with id 4, in which case the MPI module considers ‘Jack Rives’ and ‘Jackie Reeves’ as the same person.

The matching process can be run on different source tables as many times as the user desires, but until the training process is run again. Also, the newly added MPI and MPI Reference records will not have any effect on the consecutive runs of the matching process.