Notes and comments

Are we reaching a turning point? There have been several papers recently drawing attention to problems with the poor design of many animal experiments. Some of them are briefly summarised below.

If it is now recognised that there is a problem, should we be getting on to the next step, of doing something about it?

But there is a dilemma. Better experimental design means fewer positive (but false) results. That may mean fewer exciting papers in high impact journals, leading to less funding for those labs which are doing things well. Take, for example, the Scott et al paper (see below). More than 50 papers have been published on drugs which appeared to increase lifespan in a mouse model of ALS, but only one works clinically. Scott et al suggest that virtually all of these are false positives. They have now screened 70 compounds using a much better experimental design and improved statistical methods, and all are negative. After five years of careful work, they have no exciting positive papers to publish. Yet their work is clearly of great value.
These points are specifically raised by  Serewitz (see below).
How should the funding organisations respond to such a challenge?

Below are comments or extracts from several recent papers

Funding organisations promote use of ARRIVE Guidelines (August 2012).Heads of three major funding organisation (MRC, BBSRC and Wellcome Trust), write an open letter to university vice-chancellors, Principals and Heads of Divisions encouraging scientists to improve the design of experiments using animals and use the ARRIVE guidelines. Click here for the press release.

The ‘3Is’ of animal experimentation (June 2012)
An editorial in Nature Genetics (44, 611, June 2012) strongly defends the use of animals in biomedical research, but states that “Careful experimental design and reporting are at least as important as attention to welfare in ensuring that the knowledge we gain justifies using live animals as experimental tools”. It recommends the ARRIVE guidelines but express concern that the emphasis on “Reduction” may be counterproductive as it could lead to under-powered experiments resulting in more false negative results. It suggests that the reduction should come from better control of variation, so that group size can be reduced (in agreement with this web site). It goes on to state that: “In our opinion, the classic review by Sean Scott and colleagues (Amyotroph. Lateral Scler. 9, 4–15, 2008) should be required reading for anyone designing an animal experiment.” (see below).

 “Raise standards for preclinical cancer research “ (March 2012)
In a commentary article in Nature C. Glenn Begley and Lee M. Ellis (Nature 483:532, 29 March 2012) say that over the past decade scientists at Amgen have attempted to reproduce the results from 53 “landmark” studies in cancer research, sometimes with the assistance of the authors, but could only do so in six (11%) cases. They consider that the problems arise from the use of a small number of tumour cell lines which do not recapitulate human disease, an inability to capture the human tumour environment, the use of problematic end-points and a poor appreciation of pharmacodynamics and pharmacokinetics.  They conclude that “Cancer researchers must be more rigorous in their approach to preclinical studies. Given the inherent difficulties of mimicking the human micro-environment in preclinical research, reviewers and editors should demand greater thoroughness.”  (Interestingly, they do not mention faults in the design of the experiments such as failure to randomise or blind, leading to bias.)

 “Beware the creeping cracks of bias” (May 2012)
In a commentary article (Nature  485, 149  10 May 2012) Daniel Sarewitz claimed  that “Evidence is mounting that research is riddled with systematic errors. Left unchecked, this could erode public trust”. Early signs of bias were noted in the 1990s in clinical trials funded by the pharmaceutical industry. Attempts were made to reduce this problem by strict disclosure of conflicts of interest. However, he claims that the problem stems from a wide-spread belief  “… that progress in science means the continual production of positive findings. All involved benefit from positive results, and from the appearance of progress. Scientists are rewarded both intellectually and professionally, science administrators are empowered and the public desire for a better world is answered. The lack of incentives to report negative results, replicate experiments or recognize inconsistencies, ambiguities and uncertainties is widely appreciated — but the necessary cultural change is incredibly difficult to achieve.”  He concludes that the hype from universities and journals about specific projects should be reduced and collaboration between those involved in fundamental research clinicians should be increased. The first step is to face up to the problem — before the cracks undermine the very foundations of science.”
A second step might be to ensure that all experiments are well designed, as suggested in this web site!

An example of bias from conflicts of interest  (2005)
 (abstracted from a paper by F.  vom Saal and C. Hughes  Environ Health Perspect 113:926–933, 2005).

Beware of conflicts of interest. 94/98 government funded studies found that bisphenol A (BPA), a compound widely used in the production of rigid plastics, caused developmental abnormalities in rats or mice whereas 0/8 industry funded studies found an effect. This was after subtracting the 13 studies done using the CD:SD stock, which is resistant to the effects of diethylstilboestrol and similar compounds.

Presumably any government funded research which failed to find an effect could not now be published as the authors would appear to be incompetent, and industry is unlikely to publish any studies showing a positive effect. Clinical studies are now under way, although these must be cross-sectional or cohort studies rather than experiments, which would be unethical. BPA has been banned from use in the manufacture of baby’s bottles.


Design, power and the interpretation of studies in the standard murine model of ALS (2008)
(Scott et al 2008 Amylotrophic Lateral Sclerosis 9:4-15). (This paper was referred to as “required reading” by an editorial in Nature Genetics, 2012)

More than 50 papers have described therapeutic agents which extend the lifespan of SOD1G93A transgenic mice (the standard model of ALS) but only one of these, riluzole has any clinical effect. So why were all these drugs so ineffective in humans?

Using computer modelling and statistical analysis Scott et al identified a number of confounding factors (e.g. gender, litter, censoring, copy number) likely to influence the outcome of these experiments and used a power analysis to develop an optimum experimental design for such studies.  They then screened 70 drugs (including those claimed as “positive” in previous tests) using 18,000 mice in 221 studies over a five-year period using rigorous statistical techniques. None of them (including riluzole) extended the lifespan of the mice.

The identification of these uncontrolled confounding factors and the failure of the drugs to have any effect in well controlled repeated experiments led them to conclude that most of the published effects are simply the result of noise rather than drug effects. (These noise effects would be exacerbated by any failures in randomisation and blinding).