“It’s a truly steep climb to cart unsuccessful replications of other scientists’ studies up Mount Publication. Major journals make no secret of preferring papers that describe novel advances and attract media attention, says psychologist Hal Pashler of the University of California, San Diego. Not surprisingly, failed replications often get filed and forgotten, meaning studies that support priming have gotten more attention than those that don’t.”
If you don’t read the article — or do read it and only remember one thing — the most important thing mentioned in Bower’s article “The Hot and Cold of Priming” is the site PsychFileDrawer.org. It’s a site for posting unpublished replications of previous psychological studies.
As mentioned in the I opened this article with, it’s a lot harder to get journals to publish attempted replications of previous studies that show the same result was not observed.
There’s a lot of detail I’m not going to go into, since Bower’s very good article already covers most of it. To briefly summarize:
- “Priming” is the theory that people respond to subliminal cues. If you are holding a cup of warm coffee, you will have a warmer attitude toward strangers, and a colder attitude if you are holding a cup of cold liquid. If you read about older people, you’re going to move slower yourself since older people are generally regarded as moving slower.
- Priming got a lot of publicity in 1996 when Yale University psychologist John Bargh published the results of a number of priming studies he had done. As an example, college students had to work with scrambled sentences. Students whose assigned work included sentence which contained words often associated with older people, such as “wrinkle” and “Florida”, took a second slower to walk down an exit hallway at the end of the study. 
- When some other psychologists, including Stéphane Doyen of Université Libre de Bruxelles in Belgium, tried to replicate Bargh’s experiments, they found a lot of the experiments’ results depended on whether the testers administering the experiment knew what results would be considered positive for the experiment (“positive” in the sense of showing an effect that is statistically significant).
And the fight is on . . . .
Skeptics of priming say that since the results repeat most reliably when test administrators are told what results the test is hoping to find, the tests actually show that volunteers for psychology tests are really consistent at noticing and responding to subtle unspoken cues from test administrators in order to get a result the test subjects think will make the test administrators happy. Since there was even a test where the test subjects were able to guess fairly accurately afterwards what the test was about and what effect it was trying to produce on them, it would follow that psychology tests need to be designed very very carefully and a lot of detail needs to be included in reports so everyone reading (and citing or relying on) the test results knows exactly what the test did and didn’t do.
Supporters of priming say no, the tests were fine, the skeptics introduced biases of their own by conducting the tests in different countries where various stereotypes (such as “old people are slow”) are not as prevalent. And besides, say supporters, skeptics just don’t like the possibility that complex behavior can be affected so much by small subtle cues. So the skeptics are letting their own biases influence the results of their studies, but the original priming studies were fine. 
Whichever it is (personally, I think the skeptics have some pretty valid points), what is established very strongly in Bower’s article is a lot of psychology studies — and perhaps the whole field of psychology — could benefit from more openness, more attempts are replicating studies, more acknowledgement and publication of failed attempts at replication, and more rigorous thinking about the scientific method in general.
There’s a section in article about the “null hypothesis”. I’ll admit I didn’t completely understand it, but what I did grasp is psychologists come up with a null hypothesis that a certain variable doesn’t actually affect anything. Then they design a test that should turn out one way if the variable truly doesn’t affect anything, and if they can show a statistically significant affect associated with that variable, the hypothesis must be true.
But what that actually does (assuming I haven’t completely misunderstood or bungled the way I wrote that) is confuse causation with correlation. Just because two observed behaviors seem to be linked in some fashion doesn’t mean one causes the other.
However, the following section leads me to believe my explanation & criticism are not too far off the mark:
“At the San Diego meeting, psychologist Joseph Simmons of the University of Pennsylvania in Philadelphia argued that researchers often arbitrarily exclude data considered unreliable, alter experimental conditions that don’t work as planned and otherwise fiddle with what goes into a final report. Cherry-picking data in this way masks the statistical weakness of published studies and raises doubts about many reports of statistical significance, he said.
For decades, a string of influential psychologists have recommended disposing of null hypothesis significance testing altogether, calling the approach an unscientific ritual that should be replaced by testing specific predictions. In the February Theory & Psychology, psychologist Charles Lambdin of Intel Corp.’s campus at Ronler Acres in Hillsboro, Ore., calls significance testing psychology’s “dirty little secret.”
A 5 percent significance level merely indicates that chance may not be responsible for slowed walking among readers of elderly references, Lambdin says. But the likelihood of any proposed explanation for the results remains unknown. In Bargh’s study, slow walking might be due to priming, subtle coaxing by experimenters, volunteers’ guessing the purpose of the study, a combination of all three or something else entirely. A rejected null hypothesis sulks in the corner, saying nothing about the relative merits of any potential reason for its existence.”
 Students taking a second slower to walk down a hallway is not what I would call earthshaking results. Yes, the experiments are looking for subtle effects, but unless that’s a really really short hallway, a single second sounds like statistical noise more than a statistically significant result to me.
 The argument against skeptics that the skeptics are just upset at the notion of small cues influencing complicated behaviors, and that’s why they design tests that fail to replicate original results, sounds to me like a red herring (although probably an unintentional use of one). There’s a whole separate topic that I need to write about one day regarding the belief among many people — especially academics, college graduates, and other designated “smart” people — that if someone feels an emotion about something, then they’re not capable of being objective and their arguments can be dismissed. I personally think this viewpoint is foolish at best and at worst a deliberate attempt to avoid a question by casting doubt on the questioner. If a question is a valid question, then it remains valid regardless of the emotional state of person asking the question. In fact, to me that is a test of whether a line of reasoning is well-constructed or not: does it still stand up to examination if you vary your assumptions about the reasoner’s state of mind?