Non-replications: The good, the bad, and the ugly // Cogsci

Let's face it, psychological experiments tend to produce flimsy data. As a result, it often happens that experimental psychologists fail to replicate the results of others. You can deal with a failure to replicate gracefully, defending your results if necessary, and ultimately admitting you were wrong if indeed you were. Since psychologists are wrong all the time there really should be no shame in this.

Recently(ish), there were two high-profile non-replications that attracted considerable attention. I think these two cases illustrate the different ways in which scientists can interact. And I won't be coy about passing moral judgement here: There is a good and a bad way.

The first case is kind of amusing, I think. It concerns a study by Daryl J. Bem that purportedly proves the existence of precognition, clairvoyance, or whatever you want to call it. From the abstract:

"This article reports 9 experiments, involving more than 1,000 participants, that test for retroactive influence by 'time-reversing' well-established psychological effects so that the individual's responses are obtained before the putatively causal stimulus events occur."

The fact that this was published in a serious scientific journal is bound to draw scepticism. I hear the taxpayer wondering: "Is this what you guys are doing with my money?" Well... yes.

But on the other hand you could also argue that the publication of this paper demonstrates that researchers can be quite open minded. Bem's study may be silly, but it's not flawed in any obvious way, so it deserves to be taken seriously.

Taking a study seriously also means scrutinising it, which is what Erik-Jan Wagenmakers and his group did. Wagenmakers is a statistical expert who thrives on taking down other researchers' findings. He usually does this in a playful, funny way, and his papers are a joy to read. (As long as they are not about your own research, of course.) His main criticism on Bem's paper is that Bem cherry-picked his analyses. Because if you perform enough analyses on your data, some are bound to show significant results just by chance. Wagenmakers explains:

"For instance, Bem's Experiment 1 tested not just erotic pictures, but also neutral pictures, negative pictures, positive pictures, and pictures that were romantic but non-erotic. Only the erotic pictures showed any evidence for precognition. But now suppose that the data would have turned out differently and instead of the erotic pictures, the positive pictures would have been the only ones to result in performance higher than chance. Or suppose the negative pictures would have resulted in performance lower than chance. It is possible that a new and different story would then have been constructed around these other results."

This may sound very bad, but the thing to note here is that Bem didn't really do anything wrong. Or, more accurately, he didn't do anything unusual. Many psychologists, perhaps all of them (us), perform these kinds of dubious analyses. And to his credit, Bem appears to have encouraged other researchers to replicate his effects, even sharing his experimental software and stimuli (which should not be remarkable, but it is). This suggests that Bem genuinely believed in his own results.

He may therefore have been quite surprised when other groups failed to replicate his results. But the fact that Bem was (reasonably) open and co-oporative makes that this is an example of good, if weird, scientific progress: Bem proposed a wild idea that appeared to be supported by the evidence. His study was viewed with scepticism, but taken seriously, scrutinised, and ultimately (largely) discredited. The fact that his study was not discarded out of hand is important, because, as French points out:

"Once we think we know in advance which effects are real and which are illusory, true scientific objectivity flies out of the window. Having said that, my personal opinion is that retroactive facilitation of recall is not a real effect."

So far, so good. The bad case concerns a study by John A. Bargh and his colleagues. In this study, which is somewhat of a contemporary classic, participants were shown words that could be related to old age. These were obvious words such as old, retired and grey, and less obvious words such as selfishly, gullible, bitter, and lonely. (In all fairness, words were chosen based on a previous study, and not derived from the experimenters own prejudices.) The crucial finding was that participants who were unwittingly primed with old-age-related words walked away from the experiment more slowly than the control subjects who saw other words. Bargh concluded from this (and similar findings) that behaviour can be modified by implicit primes: In this case, seeing old-age-related words activates your stereotype of an elderly person, and you consequently behave somewhat like an elderly person yourself (e.g., by walking more slowly).

However, a few months ago, Doyen and his colleagues published a failure to replicate this effect. They found that old-age-related words had no effect on walking speed at all. Unless, and this is where the badness begins, the experimenters are aware of the expected outcome. From this, they concluded that experimenter bias probably confounded the results of Bargh's original study. From the abstract:

"This [failure to replicate] suggests that both priming and experimenters' expectations are instrumental in explaining the walking speed effect."

This conclusion is very tempting, but premature. As pointed out also by Sanjay Srivastava, the fact that experimenter bias can induce the effect (which is hardly surprising) does not mean that it actually induced the effect in the original study. It may have, but the study by Doyen and colleagues doesn't 'suggest' this at all. Unfortunately, many people have failed to grasp this point, and Doyen's study has been widely interpreted as disproving Bargh's original claims.

So it's understandable that Bargh was a bit annoyed. I personally think that implicit priming is a real effect, even though it may be difficult to replicate. But that's not the point. The point is that Bargh's response was outrageous. I invite you to read his blog post, entitled Nothing in their Heads, but I'll provide you with some highlights as well.

He starts by attacking PLoS ONE, the open access journal in which Doyen and colleagues published their results:

"The Doyen et al. article appeared in an online journal, PLoS ONE, which quite obviously does not receive the usual high scientific journal standards of peer-review scrutiny."

This is not quite obvious to me at all. In fact, PLoS ONE is a highly esteemed journal, widely supported by the scientific community, and at the forefront of open access publishing. Publishing in PLoS ONE is certainly not self-publishing as Bargh claims elsewhere in his rant. It appears that he aims to discredit Doyen and colleagues' results by attacking the journal in which these were published.

He then goes on to accuse PLoS ONE of not having a proper editorial process, saying that:

"Expert editors also know the relevant theory and past research in a given domain, and also know of common methodological pitfalls that inexpert researchers in the domain—such as, apparently, Doyen et al. (keep reading)—can fall prey to."

Bargh clearly feels (and says) that Doyen and his colleagues are amateurs, a point which he drives home by referring to them as "incompetent or ill-informed researchers performing the replication."

Again, the point here is not whether implicit priming exists or not, but that we may never find out when the tone of the debate is like this. On the bright side though, non-replications are finally starting to be published. And that's in no small part part due to PLoS ONE, the online journal that quite obviously does not meet the high scientific standard of refusing to publish non-replications.

References

Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71(2), 230-244.

Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407-427.

Doyen, S., Klein, O., Pichon, C.-L., & Cleeremans, A. (2012). Behavioral priming: It’s all in the mind, but whose mind? PLoS ONE, 7(1), e29081. doi:10.1371/journal.pone.0029081

Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the future: Three unsuccessful attempts to replicate Bem’s "retroactive facilitation of recall" effect. PLoS ONE, 7(3), e33423. doi:10.1371/journal.pone.0033423

Wagenmakers, E. J., Wetzels, R., Borsboom, D., & Van Der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426-433.