The Reproducibility Crisis

Morgan McSweeney

As Alex Woodell recently described, the preclinical research community is in a quiet crisis. Somewhere between 50% and 90% of results from early-stage academic cancer research are unable to be reproduced by industry scientists. Studies by several large multinational pharmaceutical companies and a number of other independent research groups have confirmed what they formerly held as a suspicion – many results generated by academic laboratories cannot be confirmed by industry scientists.

The process of taking a new candidate drug compound out of academic labs and scaling it through the phases of preclinical and clinical development can be staggeringly expensive. In an analysis of 10 approved cancer drugs, the median cost of research and development was $648 million. This same analysis found that the median revenue generated from those products was $1,658 million. However, these two figures do not mean that cancer drug development necessarily offers 2.5x returns on investment; not included in these numbers is the financial burden posed by failed projects. Of all the cancer therapeutics which enter phase 1 clinical trials (out of 4 total phases), only ~5% end up as licensed drugs on the market.

The drug development process often starts with a deep review of the previous literature. Once a company finds a promising lead, they seek to verify the original claims before beginning more expensive studies. Time and time again, however, industry scientists’ efforts are met with frustration and an inability to arrive at the same conclusions as to the original authors.

After years of encountering such roadblocks, researchers at Bayer, a multinational German pharmaceutical company, decided to study what portion of their leads end up being dead ends. Twenty-three Bayer scientists pooled data from 67 in-house projects, 47 of which were for cancer indications. They found that their results reproduced original data in only ~20-25% of those projects. For about two-thirds of the projects, there were discrepancies with the original data that led to increased time and money spent on verification studies. In 2011, similar studies were conducted by Amgen, a large American pharmaceutical company. Out of the 53 published studies they repeated, they were only able to verify the results of 6. The other 47 (89%) studies could not be confirmed with sufficient confidence to justify clinical progression.

The researchers wondered what was causing such dramatic differences between their studies and the original work. Even when Bayer scientists used the exact same cell lines or animal models as the original papers, inconsistencies were extremely common. In some cases, industry scientists communicated and visited with the original study authors to be certain that they were carrying out the exact same experimental procedures, to no avail. The reproducibility of academic preclinical work was also not found to be correlated with the prestige or impact factor of the journal in which it was published.

As a whole, there is no evidence to suggest that intentional academic misconduct is the reason for this irreproducibility. The factors underlying the irreproducibility of academic research are still largely obscure. Efforts to improve the quality of data generated in university settings will likely improve the external validity of results. Practices such as randomizing animals to treatment groups, masking/concealing treatment allocations, pre-registering preclinical trial analysis plans, and conducting lab tests and analyses while blinded are standard practice in clinical trial data processing. They must be, due to the overwhelming evidence suggesting that even well-intentioned researchers can fall victim to accidental bias.

Why aren’t these same practices the standard in academic experimental design? First, it is expensive. Improving the structural quality of experiments slows down the process of data generation. Further, there is currently not a good feedback system in place; once a paper is published, the question of whether that work is reproducible is not often addressed. In the current system of incentives, the reproducibility of the work you publish takes a distant back seat to the impact factor of the journal in which it was published.

To say that improving preclinical experimental design and documentation would be expensive, however, is to speak from the perspective of the academic lab. As a greater scientific system, such efforts would likely provide a significant positive return on investment. It is estimated that approximately $56.4 billion is spent on preclinical research each year. If 50% of those results are not reproducible, that means that $28.2 billion dollars have effectively been wasted. A study on the economics of irreproducibility suggested that of those $28.2 billion, $10.2 billion stems from unique or mischaracterized biological reagents and references, $7.8 billion is due to poor study design, $7.2 billion is due to statistical data analysis and reporting, and $3 billion is related to academic laboratory protocols.

In contrast, the extremely high standards of evidence for research in human subjects are the result of regulations requiring expensive and difficult-to-implement checks and balances. However, clinical research has not always been a rigorous process. Transitioning from the current standard of preclinical research toward a more reproducible experimental design will carry inherent costs to individual researchers without providing immediately-accessible benefits. This evolution, therefore, will likely require journals and institutions to increase the standards to which they hold authors. For example, journals may require researchers to pre-register all preclinical toxicity and efficacy studies on an immutable public register/database in order to consider their data valid. This would help increase the publication of negative data and would also ensure that analyses are not fudged to find elements of significance in data sets that have already been collected.

The checks and balances of the scientific system currently do a good job of ensuring that drugs that make it all the way through clinical development have a positive benefit/risk ratio for patients, but the process could be made more efficient by improving the quality of data collected at the level of early, preclinical studies.

One Reply to “The Reproducibility Crisis”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s