Reproducibility and Replicability in Research

By Stephanie Miceli  [This story appeared in the Summer 2019 issue of The National Academies In Focus]

Imagine giving one recipe to 10 different chefs and getting 10 completely different results. This inconsistency could be due to any number of factors — variables that cannot be controlled, omission of details, or shortcomings in design and execution.

                The same challenges apply to scientific experiments.

                One of the ways that scientists confirm the validity of a new discovery is by repeating the research that produced it. When scientific results are frequently cited in textbooks and TED Talks, the stakes for validity are high. The stakes become even higher when the results inform policy, future scientific studies, or people’s health decisions.

                A new National Academies report defines reproducibility and replicability and examines the extent of non-reproducibility and non-replicability. The report also provides recommendations to researchers, academic institutions, journals, and funders on steps they can take to improve reproducibility and replicability in science.

                “It’s harder to gain recognition if your body of work is repeating what someone has already done, rather than exploring the new,” said Harvey Fineberg, president of the Gordon and Betty Moore Foundation and chair of the committee that conducted the study. “Over time, our hope is that when a scientist takes on or attempts replication — because the value of the result can outweigh cost, because a great deal weighs on scientific basis — those types of papers will get recognition in a scholar’s career.”

Consistent Definitions

Reproducibility and replicability are commonly used terms in the scientific community. However, some fields use the terms interchangeably, or even use the terms with opposing definitions. The committee that wrote the report said it’s important to distinguish these terms to unravel the complex issues associated with confirmation of previous studies.

                Reproducibility is defined as obtaining consistent results using the same data and code as the original study (synonymous with computational reproducibility). Replicability means obtaining consistent results across studies aimed at answering the same scientific question using new data or other new computational methods.

                One typically expects reproducibility in computational results, but expectations about replicability are more nuanced. A successful replication does not guarantee that the original scientific results of a study were correct, nor does a single failed replication conclusively refute the original claims.

Several factors can contribute to non-reproducibility or non-replicability, including previously unknown variation or effects, inadequate recordkeeping, technology limitations, potential biases, lack of training, institutional barriers, or even misconduct, in rare cases.

                It is hard to quantify the extent of non-reproducibility or how much of science is reproducible. And while reproducibility and replicability are important for research, they are not the be-all and end-all, the committee emphasized.

                “The goal of science is not to compare or replicate [studies], but to understand the overall effect of a group of studies and the body of knowledge that emerges from them,” said Fineberg.

Responsibility Starts with Researchers

Academic institutions, journals, conference organizers, funders of research, and policymakers can all play a role in improving the reproducibility and replicability of research. But that responsibility begins with the researchers themselves, who should operate with “the highest standards of integrity, care, and methodological excellence,” Fineberg said during a May 7 webinar. That responsibility extends to the institutions where they are trained and continue to practice their craft.

                Important steps researchers can take include clearly and accurately describing their methods, conveying the degree of uncertainty in their results, properly using statistical methods, and preventing overhype in press releases or media coverage about their work.

No Crisis, But No Time for Complacency

Some news articles go as far as declaring a non-reproducibility and non-replicability “crisis” in science, but the committee doesn’t necessarily agree. Occasionally, non-replicability may even be helpful. For example, the discovery of new phenomena and the collection of new insights about variability both contribute to the self-correcting nature of science, and should not be interpreted as a weakness.

                Nonetheless, improvements are needed — more transparency of code and data, for example, and more rigorous training and education in statistics and computational skills.

                The report also recommends that journals and funders of research explicitly consider replicability and reproducibility in application and submission processes. This calls for culture shift so that it is in scientists’ best interest to submit these types of papers — and that they become the norm.