Rare Disease Data Trust and Deep Data Insight’s EDDIE application:

Using OCR, ICR, NLU, AI and Machine Learning to save lives from the curse of Rare Diseases.



According to the National Institutes for Health, there may be as many as 7,000 Rare and ultra-rare diseases (collectively, Rare Diseases) with the total number of Americans living with a Rare Disease is estimated at between 25-30 million. As a result, “while individual diseases may be rare, the total number of people with a Rare Disease is large.” Individuals who suffer from them are often lost in the healthcare system for years due to a lack of awareness of signs and symptoms of the diseases and because many Rare Diseases manifest symptoms similar to other common illnesses.  Individuals with Rare Diseases typically experience a five to eight year ‘diagnostic odyssey’ from onset of symptoms to diagnosis, if they are accurately diagnosed at all.

In 2020, Rare Disease Data Trust (RDDT) partnered with Artificial Intelligence experts Deep Data Insight (DDI) to see how this problem could be addressed using bleeding-edge technology. The solution that resulted from the partnership meant that an array of linked elements needed to work in holistic union. These elements were:

  • Optical Character Recognition (OCR)
  • Intelligent Character Recognition (ICR)
  • Natural Language Processing (NLP)
  • Natural Language Understanding (NLU)
  • Machine Learning (ML)
  • Artificial Intelligence (AI)

We believe that this is the first time ever that a combination of these solutions has been used to such effect in the health sector.


During the course of the diagnostic odyssey, the best therapeutic window may elapse, while the disease and its severity progress, often with the onset of irreversible damage.  Rare Disease patients frequently endure years of misdiagnosis, incorrect treatments, high costs, and negative physical and emotional consequences that ensue from this process.

Difficulty identifying patients suffering from rare diseases is a major obstacle to effective diagnosis and treatment. Because incidences of each such disease is low, useful detection must be made using population-scale databases like electronic health records (EHRs) or Web based applications to include laboratory, Rx, and diagnostic activity logs, as well as structured and unstructured data, and therefore must be based on computational models.

While it is possible to develop these models through consultation with experts, this strategy can be problematic for several reasons, including the incomplete nature of experts’ understanding, complexity of disease processes, the disparate nature or lack of data sharing, and cognitive biases associated with human decision-making. The human component also cannot assimilate terabytes of data that are structured and unstructured in timely fashion much less in its entirety.

Therefore, a technology solution capable of identifying Rare Disease Patients in an efficient, successful process needed to be designed, tested, proven, and implemented.

This would be a complex solution as it had to be capable of easily ingesting data, indexing all data formats in useable form, utilizing paper notes and digitized notes, scanning images, diagnostic images, Rx, Lab, and genetic testing.


By partnering with Health Systems, on the one hand, and with Biopharma, on the other, RDDT and DDI sought to align the data needs of each to help patients with Rare Diseases discover their diagnoses earlier in the course of their disease progression, end the diagnostic odyssey, and, in doing so, improve health outcomes for Rare Disease patients and reduce costs to all stakeholders.

Rapid diagnosis in Rare Disease dictates developing computational models for a careful and more efficient patient identification and diagnosis. The use of the most advanced technology and DDI’s proprietary AI application – ‘EDDIE’ – means we break those barriers to early detection.

Critical to the effort is the capability of scanning and reading unstructured data using EDDIE. This not only utilizes OCR, ICR, and NLP, but a proprietary application of NLU which utilizes AI to not only rapidly identify but predict missing data or gaps in the diagnosis which might go undetected. Identifying Machine Learning (ML) patterns specific to a chosen disease in a large commercial data set, we developed two deep learning models. And, utilizing our proprietary Artificial Intelligence (AI) application, we set out to identify disease state, suspect isolate patients.

We removed barriers to entry and alleviated the burden on IT departments by ingesting data through our proprietary ingestion engine, Rosetta, where we can utilize all data types from multiple data sources, without any complication. This technology advancement from DDI allows RDDT to assimilate structured and unstructured data through Rosetta and EDDIE then utilize pattern development and AI to identify biomarker and disease profiles resulting in deep learning models which then identify suspect isolate patients.


We took a large dataset of patient information, 3MM records, and created a model of E85.X patients. This data set produced 335 unique patients which went into the training set of 71,000 total records utilizing ML/AI and NLU to identify every possible correlational data point.

Patterns were recognized from correlational data and once the data model was reviewed by healthcare professionals for bias and retrained, the models identified;

  • One of the three patients, Patient 1, was isolated with a frequency of disease pattern suggesting the need for further testing for E85.2
  • Patient 2 was isolated with a frequency of disease pattern suggesting the need for further diagnosis which might lead to E85.2
  • Patient 3 with biased data which would rule out the diagnosis

We conferred with Subject Matter Experts at each phase for confirmation of information and removal of bias. Suspect isolates are then passed to providers to verify the disease state through additional testing and begin the appropriate treatment for the benefit of the patient.

It is important to note that this Proof Of Concept identified a biomarker found in year two – nail atrophy L60.3, in 80% of diagnosed patients that was a part of the diagnosis corresponding with E85.2 suggesting early detection and treatment timelines. This biomarker had not been recognized as a part of the diagnosis until this discovery which makes the results quite meaningful in determining a disease state early in the diagnostic odyssey.

CTO of RDDT, J Mark Tumblin summarized as follows:

“Individuals with Rare Diseases typically experience a five- to eight-year diagnostic odyssey’ from on-set of symptoms to diagnosis. Many exhibit symptoms that are deemed insignificant but would shorten the diagnostic odyssey if detected. When combined, they are prevalent enough to ruin thousands of lives, but in isolation, they are not visible enough to warrant high-priority healthcare funding.

“This is where Artificial Intelligence can play a role. By reducing the cost of manual work associated with data checks, we immediately remove an enormous workforce effort and financial barrier. By using aspects of AI technology, we increase our ability to detect suspect patients exponentially.

“It is safe to say that without EDDIE, the diagnostic odyssey would result in lives being lost.”

CEO of DDI, Jeewa Perera commented:

“Deep Data Insight make innovative data science solutions that address real-world problems in individual sectors, but that can then be reused in other cases and other sectors.

“We are specialists in combining disparate data science elements into one solution.

“EDDIE is one of a kind. There is no other solution capable of accurately processing data from multiple sources at immense speed, and then make ultra-reliable predictions that have real-world use.

“EDDIE is an incredible tool but is just one solution that has emerged from the world-class Deep Data Insight AI Engine.”