Raising the Bar on Newborn Screening Test Performance

Robert Guthrie, M.D., Ph.D. Photo credit: PKU Test.com

Routine screening of all newborns for inherited disorders began in the 1960s after American microbiologist Robert Guthrie, M.D., Ph.D., developed a simple test to identify babies with the genetic disorder phenylketonuria (PKU) so that these infants could receive treatment before they developed disabling symptoms. As newborn screening became more widely available as a public health program, more conditions were added for which a screening test and early treatment were available.

Typically, on the second day of life while still in the hospital, a newborn has his/her heel pricked, and blood is collected on special filter paper. The filter paper is sent to a laboratory, usually the public health laboratory in the state in which the child is born, where the blood is then tested.

What Testing Is Performed on a Newborn’s Blood?

There remains much controversy over which conditions should be included in a newborn screening panel. The list varies from state to state, even though the current Recommended Uniform Screening Panel (RUSP) endorsed by the U.S. Department of Health and Human Services (HHS) specifies 34 core conditions.

The RUSP was essentially triggered by improvements in a technology called tandem mass spectrometry, which suddenly allowed rapid and simultaneous analyses of amino acids and acylcarnitines (fatty acids) for the detection of more than 40 inborn errors of metabolism—without needing additional blood collected from a newborn.

“It was a huge improvement when the RUSP established that every child in the country should be screened for the same disorders,” says Silvia Tortorelli, M.D., Ph.D., Co-Director of the Mayo Clinic Biochemical Genetics Laboratory.

“Before the RUSP, how kids were screened depended on ‘miles,’ meaning some states screened for 5 disorders, and other states screened for more than 30.”

Although the RUSP made newborn screening more consistent across states, the effect didn’t last.

Newborn Screening: A Stately and Emotional Issue

“The problem now is that some states are adding conditions that are not recommended by the RUSP,” says Piero Rinaldo, M.D., Ph.D., Co-Director of the Mayo Clinic Biochemical Genetics Laboratory. “So we are slipping back into a state of heterogeneity and chaos.”

This move away from uniform screening throughout the country has been fueled by parent advocacy groups. Such groups are well-intentioned and capable of doing great work—newborn screening’s very origins are owed to these groups. But some groups advocate on raw emotion because the members’ own children have been affected by a specific condition.

“Very recently, we’ve had what seems like a throwback to the old days where some parent advocacy groups have directly petitioned politicians to add a condition to the newborn screening panel,” says Amy L. White, CGC, Genetic Counselor in the Biochemical Genetics Laboratory. “These groups want a particular disorder added, even though the genetics community has rejected it because there really isn’t a treatment for it yet—or the test isn’t complete—and the disease doesn’t meet the criteria for being added to the RUSP. Ethically, those conditions shouldn’t be on the panel.”

Adding Conditions Adds Challenges

It’s not as if the RUSP is static or out of date. The HHS Secretary’s Advisory Committee on Heritable Disorders in Newborns and Children created a mechanism to propose additional conditions for inclusion in the RUSP and to evaluate proposed conditions via an independent external evidence review. In this way, the list is continually reviewed and modernized. For example, over the last several years, severe combined immunodeficiencies (SCID), Pompe disease, mucopolysaccharidosis type I (MPS I), and X-linked adrenoleukodystrophy (X-ALD) have been added to the RUSP.

Yet, adding such conditions also means adding new challenges. Additional screening may require new techniques and new metrics that some state laboratories may not have the experience, or the funding, to easily implement. Any changes will take time to implement and could exacerbate existing inconsistencies. The complexity of implementing additional screening has led some states to outsource this type of testing.

“There are discrepancies among states, so not every baby receives what we would call the ‘standard of care,’” says Dr. Rinaldo.

“The problem is, now that five more conditions have been added to the original RUSP, the response rate by each state is very different. There are some states that act quickly, while other states lag behind because they need more equipment or more people,” he added.

Are Your State Health Officials Well-Informed?

Further, some hospitals and pediatricians are not familiar with what tests are required in their own states. Parents who are concerned about a condition or conditions not screened in their state have the option to purchase supplemental newborn screening through private laboratories.

“I actually had to order (and buy) extra newborn screening tests for one of my children because our state (Missouri) wasn’t going to implement expanded screening until six months after my baby was due,” says Dawn Peck, LCGC, now a Genetic Counselor at the Biochemical Genetics Laboratory.

Often, the only information parents are given is a packet, or a flyer or two, about standard newborn screening, which may not include information about supplemental screening options. “I knew supplemental screening was available because of what I do for a living,” says Peck. “It wasn’t something where my health care provider said, ‘Hey, would you like to order these extra tests for your baby?’ I had to ask for them.”

The False-Positive Problem

For the team at the Mayo Clinic Biochemical Genetics Laboratory, as more conditions are added to the RUSP, the overreaching priority is test performance and minimizing false positives, which still occur too often among state health laboratories. False positives can disrupt the delicate first weeks of life when parents should be bonding with their newborns, and they can cause lasting emotional stress.

The definition of a false-positive result can vary across states and laboratories, so exact comparisons can be difficult. In some states, abnormal results that trigger a request for a repeat sample are not considered false positives if the second sample yields normal results. However, because this requires patient contact and additional specimen collection, the Biochemical Genetics Laboratory at Mayo Clinic considers a false positive for newborn screening as "any reported result, later found to be normal, that requires additional patient contact."

Take Minnesota, where Mayo Clinic is headquartered. For many years, Mayo Clinic was contracted by the state to provide part of the newborn screen. In 2013, the Biochemical Genetics Laboratory's false-positive rate for this testing was 0.024%. Whereas, according to recent data, the average false-positive rate in the United States is 0.5%, which means Mayo’s test performance is about 25 times better than average.

“We want to minimize false positives while finding true positives,” says White. “Most clinical genetics health care workers know that there are a lot of false positives still, which makes parents unhappy. They don’t want to have to come back to a hospital, maybe stay overnight, and have a whole round of expensive tests done on their baby for nothing.”

This kind of scenario is known as vulnerable baby syndrome.

“There is some lasting damage that occurs from false-positive results,” adds White. “I’ve seen that directly with my own clients, how the stress on parents and families can really linger.”

False positives are so prevalent that doctors can even grow immune to them. “It can desensitize the family physician who sees an abnormal result and may assume it is another false positive’” says Dr. Tortorelli. “When, in actuality, it could be a true positive that needs attention right away.”

Mayo Clinic receives abnormal results from all over the country and considers second-tier testing critical to confirming a false positive from a true positive. Second-tier tests are typically more sensitive and specific than the primary newborn screening assay, but for various reasons, including cost, time, and complexity, they are not suitable to be used as primary screening assays. By performing a second test when the primary screening results are abnormal, the positive predictive value of newborn screening can be greatly improved.

“We are big advocates of second-tier testing because these tests are much more sensitive and specific, and they do not require a new blood specimen,” explains Dr. Tortorelli. “But they can’t be used for initial screening because these tests take longer to perform and should only be done on that small percentage of samples.”

Some state laboratories perform second-tier testing, while others do not.

Harmonizing Newborn Screening in the U.S. and Abroad

In order to mitigate this national (and international) problem of false positives and to raise the bar on test performance, in 2004, Mayo Clinic physicians and scientists developed a newborn screening project called Region 4 Stork (R4S). The R4S tools were designed to promote improved laboratory quality of newborn screening via tandem mass spectrometry. The project offers freely available, on-demand access to post-analytical tools designed to interpret analyte profiles of a single case.

The basic tenet of R4S’s software is that an abnormal result is not defined exclusively as a deviation from normal. The software also evaluates how consistent a result is with the disease range for each condition—an assessment that is more informative than a traditional “one-size-fits-all” cutoff value. This was made possible by a database of true-positive cases of massive size, which has brought together an unprecedented level of cooperation and collaboration on a global scale.

“The goal behind R4S was to have at least 50 cases of every condition in the recommended (RUSP) panel,” says Dr. Rinaldo. “I think, with one or two exceptions, we are there. And for some diseases, we have thousands of cases.”

The database for R4S has grown to include cases contributed from more than 154 public health programs and private laboratories from 64 countries. And because the database has such meaningful clinical utility, a lot of newborn screening laboratories are now using R4S in their day-to-day operations.

In a 2014 study led by Mayo Clinic, performance metrics of the tools of R4S were compared to the outcome based on cutoff values for select analytes in the California screening program. The study, published in Genetics in Medicine, was a retrospective review of the outcome of 176,186 babies born in California between January 1 and June 20, 2012. Remarkably, the study concluded that R4S tools, second-tier tests, and other evidence-based interpretation rules could have brought the false-positive rate to as low as 0.02%, reducing false-positive cases by up to 90% in California.

More recently, Mayo Clinic has introduced a refined version of R4S, called Collaborative Laboratory Integrated Reports (CLIR), which already has 54 sites worldwide contributing data.

“If we think of R4S as one-dimensional, we can now go three-dimensional with CLIR,” says Dr. Rinaldo. “Now, we can adjust the results for a number of demographic covariates, which has been a problem for many standard newborn screening laboratories. CLIR can take values and correct them for age, birth weight, and sex. These variables can make a huge difference—that’s why other labs tend to have different cutoff values, particularly for premature babies. ‘Preemies’ are where the bulk of false positives happen.”

While CLIR is freely available for laboratories to use, Dr. Rinaldo’s team has only one stipulation—that users openly contribute their data.

CLIR is what prompted White to bring her extensive experience as a genetic counselor to the Biochemical Genetics Laboratory several years ago.

“It’s been amazing to watch the development of CLIR, which started out as R4S,” says White. “CLIR houses so much newborn screening data from multiple states and countries—and from our own laboratory—that we’re able to differentiate truly affected babies from unaffected babies. So, if you compare Mayo Clinic to any state lab that offers newborn screening through public health programs, Mayo has the lowest rate of false positives, which is awesome.”

She adds, “We want to provide these families and their newborns with the most accurate test results we can. Newborn screening can and will continue to save children’s lives and improve their quality of life as a result. If we can remove that worry that parents get with false positives, we want to achieve that.”

The CLIR Tool: At a Glance

The single-condition tool integrates complex patterns of test results into a single score. The score is assessed against a threshold of clinical significance, and when found informative, also represents a degree of likelihood of disease, shown as a percentile rank in comparison to confirmed positive cases. In this example, a patient is being assessed using the tool designed to identify the lysosomal disorder Mucopolysaccharidosis type I (MPS I). The score is informative and is ranked at the 51st percentile when compared to five confirmed positive cases.

The underlying foundation of the single condition tools is the recognition of which analytes and analyte ratios are most useful to make a biochemical diagnosis of a specific disease. Markers where there is the most separation between individuals with the disease and the reference population are the most informative and are given the highest weight in the scoring process. The single condition tool includes a plot showing the values under examination (red diamonds) to visually demonstrate where a particular case falls in comparison to the reference and disease ranges established for each analyte.

The dual scatter plot is a tool designed to provide a differential diagnosis between two conditions with similar biochemical phenotypes. This is particularly helpful for the prevention of false-positive outcomes. In this case, the tool has plotted the current case (red diamond) along with data from confirmed positive MPS-I samples and samples that were found to be false positives for MPS-I. The ability to differentiate between these two populations increases the positive predictive value of the test under evaluation.

Chris Bahnsen

Chris J. Bahnsen covers emerging research and discovery for Mayo Clinic Laboratories. His writing has also appeared in The New York Times, Los Angeles Times, and Smithsonian Air & Space. He divides his time between Southern California and Northwest Ohio.