The Region 4 Stork (R4S) Collaborative Project, Part 3: Post-Analytical Interpretive Tools
Newborn screening has been very successful in identifying newborns with classic syndromes. However, many newborns have complex metabolic profiles that are difficult to interpret, leading to false-positives and missed diagnoses. To improve detection of true-positives, the Regional Genetics Collaborative project, funded by the Health Resource and Service Administration, has created the Region 4 Collaborative Stork project, which uses Mayo-developed software for postanalytical interpretation of complex metabolic profiles.
In R4S Collaborative Project Part 3, Piero Rinaldo, M.D., Ph.D., explains how to use the post-analytical interpretive tools.
I have a disclosure to make: a provisional patent application related to some of the content of this presentation has been submitted by Mayo Clinic. The title of the application is “Computer-Based Dynamic Data Analysis.”
This presentation focuses on the second generation of R4S tools. It will take this and two more segments to complete the overview of these tools and of their clinical utility.
We begin with a summary of the productivity tools that have been described previously. What they all have in common is the goal to collect and analyze the data needed to establish clinically-validated cutoff target ranges for the markers measurable by tandem mass spectrometry in neonatal dried blood spots.
This publication summarized the status of the R4S project at the end of 2010. At that time, a total of 5,341 cutoff values had been uploaded to the R4S database and in the paper were sorted according to their standing in comparison to the corresponding target range. The results of this analysis were as follows: 2,269 cutoffs (or 42%) were set properly within the limits of the respective target range; 788 (or 15%) were actually positioned unnecessarily too close to the reference range, and were likely to cause recurrent false positive outcomes. The remaining 2,284 (or 43%) had the opposite and more concerning problem: these cutoff values were overlapping with the disease range of one or more conditions and therefore were intrinsically at risk of missing affected cases, resulting in false negative events. This observation was compounded by the fact that more than 40% of these cutoff values with an intrinsic risk of low sensitivity were applied to 37 analytes and ratios that had no overlap between reference and disease ranges. Despite this compelling evidence, overall there was only limited response to the publication of this work and to date only a small fraction of the inadequate cutoff values had been corrected by the laboratories participating in the R4S project.
For this reason, it became a priority to explore ways to rely more on the power of pattern recognition and profile interpretation of complex metabolic profiles rather than depending on cutoff values, even when their clinical validation had been accomplished. In this presentation , the content of the second publication of the collaborative project will be illustrated to show how the goal of interpreting newborn screening results without analyte cutoff values could be achieved.
This new table shows the basic characteristics of the post-analytical tools. The two columns on the right side have been obsoleted , and since 2011 it is fully possible to interpret newborn screening profiles without using cutoff values and related target ranges.
In their place, the summary table of the post-analytical tools includes information about the applicability of each tool to either one or to multiple cases, and a synopsis of the standardized interpretation guidelines included in each tool.
Before discussing specific tools, it is important to give a brief overview of how these tools are actually created. This process is called the Tool Builder.
What is the Tool builder?The tool builder is the foundation of a multivariate pattern recognition software that creates either completely new or modified versions of the different types of post-analytical interpretive tools. When a user is given access to this function, he or she can make a customized site-specific tool following a sequence of 10 relatively simple steps shown on the right side of the slide, a process that after proper training can take as little as 5 minutes to complete. The post-analytical tools are instruments of clinical utility and work best when used to provide an answer to one of three types of questions: a yes or no situation (does a patient have or not a specific condition?); a differential diagnosis between two conditions with similar biochemical phenotype (for example, VLCAD deficiency and VLCAD carrier status); and to answer simultaneously the yes or no question for more than two and up to hundreds of conditions if analyte disease ranges have been properly established to allow the release of a tool targeting each of those conditions. A defining characteristic of the post-analytical tools is the evolution of clinical validation from the conventional static process, one usually performed early during test development, to a constantly evolving, dynamic refinement of the disease ranges that continues to improve throughout the entire test life cycle process.
In the following slides, the basic concepts of the post-analytical tools will be discussed. First, we will compare the benefits of a parallel rather than sequential evaluation of potentially informative markers. As an example, let’s use a hypothetical condition were the biochemical phenotype is defined by two informative markers, A and B, and two related ratios (the A to C ratio and the B to C ratio). In a traditional sequential model, a decision point is set for each marker by a fixed cutoff value, shown here as generic letter (W, X, and Y). Assuming these cutoffs are above the respective reference range (in other words they are “high” markers), if the patient’s value is below the cutoff at any of these decision points the outcome would be a negative result. If all four markers are abnormal , only then the test will be interpret as positive. The insert shows an actual sequential algorithm of comparable complexity, one that applies to the combined interpretation of 5 steroids measured as part of a newborn screening 2nd tier tests for congenital adrenal hyperplasia. The most useful marker from a diagnostic perspective is a calculated ratio, the sum of two precursors, 17OH Progesterone and Androstenedione, divided by the end product of that biosynthetic pathway, cortisol. Based on a statistical analysis, the cutoff value for this ratio was set at a value of 2.5. It can be argued that a marginal difference of just 0.1 in the calculated ratio is unlikely to reflect a true separation between true positives and true negatives.
Sequential algorithms are also affected by yet another source of variability, the making of rules according to the “AND/OR” model. In a pure “AND” model , each element of the algorithm must meet the definition of abnormal, and a single exception is sufficient to trigger a negative outcome, regardless of the magnitude of the other results. Because of its intrinsic inflexibility, algorithms may include the concept of “OR”. For example, if either analyte A OR analyte B exceeds the respective threshold, that is sufficient evidence to overrule the other result and move on to the next decision point, if any. In an entirely “OR” model, like the one shown on the far left, as long as one of the potentially informative markers exceeds the respective cutoff value the cumulative interpretation is that the test is positive. Obviously, there could be countless combinations of these rules that rapidly become almost impossible to memorize and are challenging to keep updated in a laboratory information system. For these reason, a parallel algorithm is intuitively a better process.
In a parallel model, all analytes are considered simultaneously. The entire profile of primary data is processed by a condition-specific post-analytical interpretive tool. The tool formulates a cumulative score based on the pattern of all analytes expected to be informative and the score is expressed as the percentile rank in comparison to all known cases with the target condition. The score and its relative ranking are applied to standardized guidelines that allow to interpret the overall profile to be either negative or positive for that condition.
Another critical difference between a cutoff-based protocol and the post-analytical tool is the progressive scoring of individual results on the basis of two integrated dimensions: the degree of overlap between reference and disease range, a concept that was illustrated in a initial presentation of this series, and the degree of penetration within the portion of the disease range that does not overlap with the reference range. To elucidate this concept , we will use again VLCAD deficiency as a model. The column showing the actual results of this patient is highlighted on the right side . The analyte shown in the top row of this table is the acylcarnitine species C12:1, and the measured value is 0.50 nmol/mL. On the left side , the table shows that the 99th percentile of the reference population is equal to 0.27 nmol/mL. As the degree of overlap between reference and disease range is equal to 34%, a result of 0.5 will not generate a score until it exceeds the 30th percentile of the disease range, which is 0.25 nmol/mL. From that point on, the crossing of each percentile thresholds that is lower than the patient’s value of 0.5 does generate a score. C12:1 contributes to the cumulative score of this case by exceeding the 40th, 50th, 60th, and 70th percentiles of the disease range. The same process takes place for every informative marker: C14, with a degree of overlap of only 4%, generates a score starting from the 5th percentile of the disease range up to the 70th percentile; C14:1, with no overlap, generates a score immediately after exceeding the 1st percentile of the disease range and continues to do so all the way up to the 80th percentile. This action is repeated for all remaining analytes and ratios simultaneously , and that is why this process is called a parallel algorithm, where there are no dependencies between one analyte and another (the “and/or” rules). Different scoring models are available, but their discussion in details is beyond the scope of this presentation.
The scoring model of this particular tool is the increasing type, the one that is most commonly used. In this model the first crossed percentile beyond the overlap with the reference range counts for a score of 1, the second for a score of 2, with increments of 1 up to a maximum of 12. In this case , the contribution of C12:1 to the cumulative score is 10, the sum of 1+2+3+4. All individual scores are added together , ready for the final modification based on correction factors.
As the final step, proportional correction factors are applied to convert the preliminary scores to a final value. These factors are weighted to reflect the degree of overlap between reference and disease ranges, as best documented by the plot by condition , and are unique to the condition under consideration. The final score is equal to the sum of all individual scores multiplied by the respective correction factor.
The post-analytical tools also incorporate a series of rules which have been included for different purposes: Differentiator rules prevent generating a score driven by significant abnormalities of non-informative markers which are used to calculate ratios. This may happen in cases actually affected with a different condition; Outlier rules prevent generating a score driven by significant abnormalities of non-informative markers per se, not used to calculate ratios. This is a requirement unique to the all conditions tool that will be presented in the next segment; Filters prevent skewing the distribution of scores and percentile ranking on the basis of absurd values, errors in data entry, but also unavoidable true negative cases. Their need is driven by the web-based, always up-to-date nature of the tools, where any new information uploaded to the database is incorporated instantly in the corresponding tool.
After this brief overview of the basic concepts behind the post-analytical tools, it is time to introduce the first one: the one condition tool. Access to the secondary page with the links to different groups of tools is directly from the home page, like all others.
If a user selects the one condition tool link, the following page is displayed,showing at the top the link to the next window where the tools are listed.
One condition tools are split in groups according to the condition types. In the MS/MS application there are three groups: Amino acid disorders, left, fatty acid oxidation disorders, center and organic acid disorders, right. Each row is an active link that follows a constant nomenclature and format. To illustrate these features, we will use the tools for three conditions we have used as a model previously: MSUD, Cbl C,D, and VLCAD deficiency.
The tool name consists of four elements: the abbreviated name of the condition, the tool version (in this case, the current version of the VLCAD tool is #15), the date it was released (in a year-month-day format), and the type of tool. All other tools are named following the same format.
There are six components of the one condition tool and they are listed here.
The first one is the data entry window, shown here for the three model conditions. In the lower section, the data entry fields are empty and are arranged in three groups: low markers, only shown in the Cbl C,D tool on the right side, differentiators, and high markers. These analytes correspond to the biochemical phenotype established by the respective plot by condition.
Data can be entered either manually or, preferably, by uploading a comma separated value (abbreviated as csv) file. The score is calculated by clicking the calculate button at the lower end of the window.
The first element to be displayed at the top of the report page is the tool banner. In addition to the information included in the name of the tool, the banner also shows the date and time the tool was used, the date of the last modification, the site affiliation and the name of the user who calculated the score.
The next element of the tool is a tabular summary of the reference and disease ranges of the informative analytes. From left to right, the columns show the analytes and ratios, sorted in the three groups mentioned earlier to illustrate the data entry window (low markers, differentiators, and high markers).
The limit of the reference ranges on the side closest to the disease range
The degree of overlap between the reference range and the condition-specific disease range, sorted in decreasing order of clinical significance (from least to most)
A selection of disease range percentiles superimposed to a shaded background when overlap is present
And the actual patient results. Ratios are calculated by the tool and need not to be uploaded in the data entry window.
The next section is the plot by condition, showing for each analyte the patient result, the percentiles of the disease range, and the 1st to 99th range of reference percentiles.
While the top and middle sections of the tool report are descriptive in nature, the bottom section, called Score Report, is where the calculated score is presented, compared, and interpreted.
There are three components of the score report: In the top left section, a table summarizes the case score calculated using disease ranges derived from the entire set of available cases (shown as “all” on the left side), and the scores calculated using only cases from a single country, and just from the single site of the user. The score are also shown as the percentile rank in comparison to the true positive cases of each set. The actual number of cases is shown at the bottom. In addition, this section also includes a link called View Calculations.
By clicking View Calculations is it possible to review the contribution of every analyte and ratio to the calculated score. This is a critical feature of the post-analytical tool to document that scoring is a transparent process based only on objective evidence that is fully accessible to users.
The second element of the score report is the comparison plot. In this figure, individual scores from the true positive database are shown as individual black lines. The same distinction between entire project , single country , and single site is reproduced here. In each column, the calculated score of the case under consideration is shown as a red diamond.
The final element of the score report of the one condition tool is a textual description of the interpretation guidelines. More than the actual value, the interpretation of the cumulative score is based on its percentile rank when compared to the population of true positive cases. A score lower than the 1st percentile of all VLCAD scores (in this case a value of 30) is considered to be NOT informative for VLCAD deficiency; a score between the 1st and 10th percentile is considered to be possibly informative; a score between the 10th and the 25th percentile is likely to be VLCAD deficiency and finally a score above the bottom quartile, in other words greater than the 25th percentile of all scores, is considered very likely to be consistent with a diagnosis of VLCAD deficiency. Although these thresholds are used consistently in all the tools available to users from every site, the so called “General tools”, the selection of interpretation guidelines, and of any additional text shown at the top of this section, is entirely up to the user who either creates or modifies a tool using the tool builder.
The clinical utility of the one condition tool could be summarized as follows: First, the one condition tools allow the complete replacement of ANALYTE cutoffs with a single CONDITION-specific threshold of clinical significance, which is the 1st percentile of all scores; second, isolated, random abnormalities that do not fit in the pattern defined by the plot by condition are filtered out as analytical noise. Third, the interpretation of a profile is driven by the %ile rank of a patient in comparison to all true positive cases in the R4S database, a type of clinical validation that is dynamic, not static, as it evolves practically on a daily basis. Finally , a tool has been released into production for every condition detected by tandem mass spectrometry with a count of submitted cases equal to or greater than 3 (ranging between 3 and 1,301 cases).
It is time now to move on to the second type of tool, the dual scatter plot. Differently from the one condition tool, one that seeks a yes or no answer, the dual scatter plot has the capability to resolve the differential diagnosis between two conditions with similar and overlapping biochemical phenotypes.
Once again we use VLCAD and VLCAD carrier status as the model to demonstrate how the dual scatter plot actually works. A side by side comparison of the two plots by condition is helpful to recognize the crucial role of two analytes, C14:1 and C12:1, and one ratio, C14:1 to C12:1.
This slide shows a simplified version of the plot by condition, where only the three markers highlighted in the previous slide are displayed. Using a plus/minus evaluation, the plots show that the disease range of C14:1 is markedly elevated in VLCAD (three pluses, no overlap with the reference range) but also moderately to markedly elevated in VLCAD carriers (shown as two pluses). Surprisingly, C12:1 behaves differently, with a higher range and consequently less overlap in carriers. Putting these findings together, it becomes apparent that the C14:1 to C12:1 ratio is the key differentiator between the two conditions.
To appreciate how the dual scatter plot works to segregate VLCAD and VLCAD carriers , shown here in bright red and orange, respectively, it is useful to first recall the underlying rules of the one condition tool. In the case of a high marker, results below the upper limit of the reference range do not contribute to the score . On the other hand, any result above the 99th percentile of the respective reference range generates a score that is proportional to the degree of penetration within the disease range that does not overlap with the reference range.
In the dual scatter plot, the rules are quite different. First, the relationship to the reference range becomes irrelevant as the comparison now takes place between two disease ranges. If the result falls within the range of overlap , there is no score modification . However , if the result is either below or above the area of overlap it triggers a score modification that is proportional to the degree of separation from the disease range of the other condition.
The dual scatter plot is actually the combination of two tools, one that targets any non-overlapping result to increase the score for VLCAD and to decrease the score for VLCAD carrier. The other tool, shown on the right side, operates exactly in the opposite way. A result within the overlap range triggers no modification . A result above the overlap range increases the score of the tool on the left and has the opposite effect on the tool on the right. A result below the overlap range reverses the impact on the two sides . So, a completely normal C14:1 to C12:1 ratio is actually very informative to achieve the desired differential diagnosis, even if the same result would not trigger any score in a one condition tool for either condition.
Another characteristic of the dual scatter plot is a different way to express the calculated scores. Instead of absolute values, the Y-axis uses a minimum-maximum normalization so that all score are kept within 0 and 100. Each result is calculated by subtracting from the score the lowest of all scores, dividing it by the range of values (highest minus lowest), and multiplying it by 100. As shown on the right side, this formula preserves the relative distance between values and is ideal to achieve consistency among tools comparing any two conditions with different numbers of informative markers.
The impact of these rules is best demonstrated by showing what happens to the two sets of scores when the rules are applied. First, we will consider the tool that is designed to favors VLCAD and penalize VLCAD carriers, all shown here as clusters of yellow circles before the rules are applied. In a Min-Max mode the Y axis now goes from 0 to 100. The scores and more importantly the segregation between the two groups changes quite dramatically after the rules have been activated.
The effect in the other tool, the one favoring VLCAD carrier status is show in this slide. Again, there is a significant improvement in separation between the two groups.
When the two tools with active rules are combined, the final result is a dual scatter plot
The output of the dual scatter plot is a visual separation of the combined scores in four quadrants. The lower right quadrant includes the cluster of cases with condition 1, those with high score in one tool and a low score with the other. The upper left quadrant includes the scores of cases with condition 2. A score located in the upper right quadrant is equivalent to an inconclusive result, meaning that both conditions are still possible. Finally, a score in the left bottom quadrant excludes both conditions. When used to investigate an unknown case, the coordinates of the combined scores of that particular case are shown as a red diamond.
Access to the dual scatter plot is from the same page that lists all post-analytical tools .
Tools are labeled in a manner consistent with the nomenclature of the one condition tools: name of condition 1, name of condition 2, version number of the tool for condition 1, version number of the tool for condition 2, a D or a U within squared brackets to indicate the inclusion of at least one marker that is affected by the use of a derivatized or underivatized method, and the date when the combined tool was created.
Data entry is identical to the one condition tool, either an automated or manual process complete by clicking the calculate icon.
The dual scatter plot available on the R4S website to distinguish between VLCAD cases and VLCAD carriers is shown here. The score of this case, shown as a red diamond, is clearly located within the VLCAD cluster, far away from either the VLCAD carriers or the inconclusive and negative quadrants.
In addition to the plot, the report shows the two calculated scores, the respective percentile rank, and the count of cases in each group.
The report of the dual scatter plot also includes interpretation guidelines for each of the quadrants. In our experience, the Dual Scatter Plot has greatly improved the differential diagnosis between VLCAD and VLCAD carriers, preventing many unnecessary referrals and follow up testing. Inconclusive results should be treated as indicative of VLCAD and receive appropriate follow-up testing.
This is the conclusion of the third part of the R4S series of Mayo Medical Laboratories Hot Topics. In part IV we will introduce the two high throughput data entry portals available in R4S to perform the simultaneous calculation of all score for a single case, the all conditions tool, or for large batches of cases, for example a 96 well plate. This second functionality is named the tool runner.
Please do not hesitate to contact us if you have any questions or requests related to the content of this presentation. Thank you very much for your attention.
Thank you for the introduction. This presentation is the third segment of a 6 part series describing the products and clinical tools of a newborn screening quality improvement project called Region 4 Stork, or R4S. The title of this presentation is “post-analytical interpretive tools”