{"id":92,"date":"2024-08-21T11:52:40","date_gmt":"2024-08-21T11:52:40","guid":{"rendered":"https:\/\/routledgelearning.com\/researchmethods\/?post_type=content&p=92"},"modified":"2024-08-21T14:09:50","modified_gmt":"2024-08-21T14:09:50","slug":"chapter-4-validity-in-psychological-research","status":"publish","type":"content","link":"https:\/\/routledgelearning.com\/researchmethods\/student-resources\/chapter-4-validity-in-psychological-research\/","title":{"rendered":"Chapter 4 – Validity in psychological research"},"content":{"rendered":"\n
This chapter investigates issues of experimental validity and links this with the different threats to validity relevant to experiments, in particular, and to all research methods in general.<\/p>\n<\/div>\n<\/div>\n\n\n\n
Tabatha and her validity threats<\/p>\n\n\n\n
In this chapter of the book there is a description of a rather naff research project carried out by Tabatha. Here it is again. As you read this passage try to identify, and even name if possible, every threat to validity that she has either introduced or failed to control in her design. A list is provided in the answers below.<\/p>\n\n\n\n
Tabatha feels she can train people to draw better. To do this, she asks student friends to be participants in her study, which involves training one group and having the other as a control. She tells friends that the training will take quite some time so those who are rather busy are placed in the control group and need only turn up for the test sessions. Both groups of participants are tested for artistic ability at the beginning and end of the training period, and improvement is measured as the difference between these two scores. The test is to copy a drawing of Mickey Mouse. A slight problem occurs in that Tabatha lost the original pre-test cartoon, but she was fairly confident that her post-test one was much the same. She also found the training was too much for her to conduct on her own so she had to get an artist acquaintance to help, after giving him a rough idea of how her training method worked.<\/p>\n\n\n\n
Those in the trained group have had ten sessions of one hour and, at the end of this period, Tabatha feels she has got on very well with her own group, even though rather a lot have dropped out because of the time needed. One of the control group participants even remarks on how matey they all seem to be and that some members of the control group had noted that the training group seemed to have a good time in the bar each week after the sessions. Some of her trainees sign up for a class in drawing because they want to do well in the final test. Quite a few others are on an HND Health Studies course and started a module on creative art during the training, which they thought was quite fortunate.<\/p>\n\n\n\n
The final difference between groups was quite small but the trained group did better. Tabatha loathes statistics so she decides to present the raw data just as they were recorded. She hasn\u2019t yet reached the recommended reading on significance tests in her RUC self-study pack.<\/p>\n\n\n\n
Answers: Possible threats to validity in the study:<\/p>\n\n\n\n A confounding variable is one that varies with the independent (or assumed causal) variable and is partly responsible for changes in the dependent variable, thus camouflaging the real effect. Try to spot the possible confounding variables in the following research designs. That is, look for a factor that might well have been responsible for the difference or correlation found, other than the one that the researchers assume is responsible. If possible, think of an alteration to the design that might eliminate the confounding factor. Possible factors will be revealed under each example.<\/p>\n\n\n\n A. Participants are given either a set of 20 eight-word sentences or a set of 20 sixteen-word sentences. They are asked to paraphrase each sentence. At the end of this task they are unexpectedly asked to recall key words that appeared in the sentences. The sixteen-word sentence group performed significantly worse. It is assumed that the greater processing capacity used in paraphrasing sixteen words left less capacity to store individual words.<\/p>\n\n\n\n Could be the extra time taken by the second task caused greater fatigue or confusion.<\/p>\n\n\n\n B. Male and female dreams were recorded for a week and then analysed by the researcher who was testing the hypothesis that male dream content is more aggressive than female dream content.<\/p>\n<\/details>\n\n\n\n The researcher knew the expected result, hence researcher expectancy is a possible cause of difference. Solution is to introduce a single blind.<\/p>\n\n\n\n C. People who were fearful of motorway driving were given several sessions of anxiety reduction therapy involving simulated motorway driving. Compared with control participants who received no therapy, the therapy participants were significantly less fearful of motorway driving after a three-month period.<\/p>\n\n\n\n <\/p>\n<\/details>\n\n\n\n There was no placebo group. It could be that the therapy participants improved only because they were receiving attention. Need an \u2018attention placebo\u2019 group.<\/p>\n\n\n\n D. After a two-year period depressed adolescents were found to be more obese than non-depressed adolescents and it was assumed that depression was the major cause of the obesity increase.<\/p>\n<\/details>\n\n\n\n Depression will probably correlate with lowered physical activity and this factor may be responsible. Needs depressed adolescents to be compared with similarly inactive non-depressed adolescents.<\/p>\n\n\n\n E. People regularly logging onto Chat \u2019n Share, an internet site permitting the sharing of personal information with others on a protected, one-to-one basis, were found to be more lonely after one year\u2019s use than non-users. It was assumed that using the site was a cause of loneliness.<\/p>\n<\/details>\n\n\n\n Those using the site had less time to spend interacting with other people off-line; need to be compared with people spending equal time on other online activities.<\/p>\n\n\n\n F. Participants are asked to sort cards into piles under two conditions. First they sort cards with attractive people on them, then they sort ordinary playing cards. The first task takes much longer. The researchers argue that the pictures of people formed an inevitable distraction, which delayed decision time.<\/p>\n<\/details>\n\n\n\n <\/p>\n\n\n\n Order effect! The researcher has not counter-balanced conditions. The participants may simply have learned to perform the task faster in the second condition through practice on the first.<\/p>\n\n\n\n G. It is found that young people who are under the age limit for the violent electronic games they have been allowed to play are more aggressive than children who have only played games intended for their age group. It is assumed that the violent game playing is a factor in their increased aggression.<\/p>\n<\/details>\n\n\n\n This is only a correlation and there may be a third causal variable that is linked to both variables. Perhaps the socio-economic areas in which children are permitted to play under age are also those areas where aggression is more likely to be a positive social norm.<\/p>\n<\/details>\n\n\n\n An extended discussion of the concept of ecological validity<\/strong><\/p>\n\n\n\n In Chapter 4 there is a discussion of the much misused and poorly understood concept of ecological validity. This is the original discussion which I trimmed down for the book.<\/p>\n\n\n\n I attempt here to fully discuss the meaning of this enigmatic and catch-all term \u2018ecological validity\u2019 because its widespread and over-generalised use has become somewhat pointless. Hammond (1998) refers to its use as \u2018casual\u2019 and \u2018corrupted\u2019 and refers to the robbing of its meaning (away from those who continue to use its original sense) as \u2018bad science, bad scholarship and bad manners\u2019.<\/p>\n\n\n\n There are three relatively distinct and well used uses of the term, which I shall call \u2018the original technical\u2019, \u2018the external validity version\u2019 and \u2018the pop version\u2019, the latter term to signify that this use I would consider to be unsustainable since it has little to do with validity and its indiscriminate use will not survive close scrutiny.<\/p>\n\n\n\n Brunswik (e.g., 1947) introduced the term ecological validity to psychology as an aspect of his work in perception \u2018to indicate the degree of correlation between a proximal (e.g., retinal) cue and the distal (e.g., object) variable to which it is related\u2019 (Hammond, 1998). This is a very technical use. The proximal stimulus is the information received directly by the senses \u2013 for instance two lines of differing lengths on our retinas. The distal stimulus is the nature of that actual object in the environment that we are receiving information from. If we know that the two lines are from two telegraph poles at different distances from us we might interpret the two poles as the same size but one further away than the other. The two lines have ecological validity in so far as we know how to usefully interpret them in an environment that we have learned to interpret in terms of perspective cues. The two lines do not appear to us as having different lengths because we interpret them in the context of other cues that tell us how far away the two poles are. In that context their ecological validity is high in predicting that we are seeing telegraph poles. More crudely, brown patches on an apple are ecologically valuable predictors of rottenness; a blue trade label on the apple tells us very little about rot. <\/p>\n\n\n\n Many textbooks, including this one, have taken the position that ecological validity is an aspect of external validity and refers to the degree of generalisation that is possible from results in one specific study setting to other different settings. This has usually had an undertone of comparing the paucity of the experimental environment with the greater complexity of a \u2018real\u2019 setting outside the laboratory. In other words researchers asked \u2018how far will the results of this laboratory experiment generalise to life outside it?\u2019 The general definition, however, has concerned the extent of generalisation of findings from one setting to another and has allowed for the possibility that a study in a \u2018real life\u2019 setting may produce low ecological validity because its results do not generalise to any other setting \u2013 see the Hofling study below. Most texts refer to Bracht and Glass (1968) as the originators of this sense of the term and the seminal work by Cook and Campbell (1979) also supported this interpretation.<\/p>\n\n\n\n On this view effects can be said to have demonstrated ecological validity the more they generalise to different settings and this can be established empirically by replicating studies in different research contexts.<\/p>\n\n\n\n The pop version is the definition very often taught on basic psychology courses. It takes the view that a study has (high) ecological validity so long as the setting in which it is conducted is \u2018realistic\u2019, or the materials used are \u2018realistic\u2019, or indeed if the study itself is naturalistic or in a \u2018natural\u2019 setting (e.g., Howitt, 2013). The idea is that we are likely to find out more about \u2018real life\u2019 if the study is in some way close to \u2018real life\u2019, begging the question of whether the laboratory is not \u2018real life\u2019.<\/p>\n\n\n\n The problem with the pop version is that it has become a knee-jerk mantra \u2013 the more realistic the more ecological validity. There is, however, no way to gauge the extent of this validity. It is just assumed, so much so that even A-level students are asked to judge the degree of ecological validity of fictitious studies with no information given about successful replications or otherwise.<\/p>\n\n\n\n Teaching students that ecological validity refers to the realism of studies or their materials simply adds a new \u2018floating term\u2019 to the psychological glossary that is completely unnecessary since we already have the terminology. The word to use is \u2018realism\u2019. As it is, students taught the pop version simply have to learn to substitute \u2018realism\u2019 when they see \u2018ecological validity\u2019 in an examination question.<\/p>\n\n\n\n For those concerned about the realism of experimental designs Hammond (1998) points out that Brunswick (1947) introduced another perfectly suitable term. He used representative design to refer to the need to design experiments so that they sample materials from among those to which the experimenter wants to generalise effects. He asked that experimenters specify in their design the circumstances to which they wished to generalise. For instance, in a study on person perception, in the same way as we try to use a representative sample of participants, we should sample a representative sample of stimulus persons (those whom participants will be asked to make a judgment about) in order to be able to generalise effects to a wider set of perceived people. Hammond is not the only psychologist worried about the misuse of Brunswik\u2019s term. Ara\u00fajo, Davids and Passos (2007) argue that the popular \u2018realism\u2019 definition of ecological validity is a confusion of the term with representative design:<\/p>\n\n\n\n \u2018\u2026 ecological validity, as Brunswik (1956) conceived it, refers to the validity of a cue (i.e., perceptual variable) in predicting a criterion state of the environment. Like other psychologists in the past, Rogers et al. (2005) confused this term with another of Brunswik\u2019s terms: representative design.\u2019 (p.69)<\/p>\n\n\n\n This article by Ara\u00fajo et al is a good place to start understanding what Brunswik actually meant by ecological validity and demonstrates that arguments to haul its meaning back to the original are contemporary and not old-fashioned. The term is in regular use in its original meaning by many cognitive psychologists. They are not clinging to a \u2018dinosaur\u2019 interpretation in the face of unstoppable changes in the evolution of human language.<\/p>\n\n\n\n Another problem with the pop version is that it doesn\u2019t teach students anything at all about validity as a general concept. It simply teaches them to spot when material or settings are not realistic and encourages them to claim that this is a \u2018bad thing\u2019. It leads to confusion with the laboratory\u2013field distinction and a clich\u00e9d positive evaluation of the latter over the former. For example, let\u2019s compare Milgram\u2019s famous laboratory studies of obedience with another obedience study by Hoflinget al (1966), where nurses working in a hospital, unaware of any experimental procedure, were telephoned by an unknown doctor and broke several hospital regulations by starting to administer, at the doctor\u2019s request, a potentially lethal dose of an unknown medicine. The pop version would describe Hofling\u2019s study as more \u2018ecologically valid\u2019 because it was carried out in a naturalistic hospital setting on real nurses at work. In fact, this would be quite wrong in terms of external validity since the effect has never been replicated. The finding seems to have been limited to that hospital at that time with those staff members. A partial replication of Hofling\u2019s procedures failed to produce the original obedience effect (Rank and Jacobson, 1977[1]<\/a>), whereas Milgram\u2019s study has been successfully replicated in several different countries using a variety of settings and materials. In one of Milgram\u2019s variations, validity was demonstrated when it was shown that shifting the entire experiment away from the university laboratory and into a \u2018seedy\u2019 downtown office, apparently run by independent commercial researchers, did not significantly reduce obedience levels. Here, following the pop version, we seem to be in the ludicrous situation of saying that Hofling\u2019s effect is valid even though there is absolutely no replication of it, while Milgram\u2019s is not, simply because he used a laboratory! In fact Milgram\u2019s study does demonstrate ecological validity on the generalisation criterion. The real problem is that there is no sense of \u2018validity\u2019 in the pop notion of ecological validity.<\/p>\n\n\n\n In a thorough discussion of ecological validity Kvavilashvili and Ellis (2004) bring the original and external validity usages together by arguing that both representativeness and generalisation are involved, with generalisation appearing to be the more dominant concept. Generalisation improves the more that representativeness is dealt with. However, they argue that a highly artificial and unrealistic experiment can still demonstrate an ecologically valid effect. They cite as an example Ebbinghaus\u2019s memory tasks with nonsense syllables. His materials and task were quite unlike everyday memory tasks but the effects Ebbinghaus demonstrated could be shown to operate in everyday life, though they were confounded by many other factors. The same is true of research in medicine or biology; we observe a phenomenon, make highly artificial experiments in the laboratory (e.g., by growing cultures on a dish) then re-interpret results in the everyday world by extending our overall knowledge of the operation of diseases and producing new treatments. In psychology, though, it is felt that by making tasks and settings more realistic we have a good chance of increasing ecological validity. Nevertheless, ecological validity must always be assessed using research outcomes and not guessed at because a study is \u2018natural\u2019.<\/p>\n\n\n\n I think the conclusions that emerge from this review of the uses of ecological validity are that:<\/p>\n\n\n\n [1]<\/a> Unlike in Hofling\u2019s study, nurses were familiar with the drug and were able to communicate freely with peers.<\/p>\n\n\n\n An article on Clever Hans \u2013 the horse that could \u201ccount\u201d \u2013 explained by confounding:<\/p>\n\n\n\n https:\/\/www.damninteresting.com\/clever-hans-the-math-horse<\/a><\/p>\n\n\n\n Articles on the replication crisis:<\/p>\n\n\n\n https:\/\/www.science.org\/doi\/epdf\/10.1126\/science.349.6251.910<\/a><\/p>\n\n\n\n https:\/\/onlinelibrary.wiley.com\/doi\/full\/10.1111\/dmcn.14054<\/a><\/p>\n\n\n\n Hammond\u2019s article on the hijacking of the term \u201cecological validity\u201d:<\/p>\n\n\n\n www.brunswik.org\/notes\/essay2.html<\/a><\/p>\n<\/div>\n\n\n\nShow answer<\/summary>\n
Name of threat<\/strong><\/td> Issue in text<\/strong><\/td><\/tr> Non-equivalent groups<\/td> Busy students go into the control group.<\/td><\/tr> Non-equivalent measures<\/td> Different Mickey Mouse pre- and post-test; a form of construct validity threat.<\/td><\/tr> Non-equivalent procedures<\/td> Training method not clearly and operationally defined for her artist acquaintance.<\/td><\/tr> Attrition<\/td> More participants dropped out of the training group than from the control group.<\/td><\/tr> Rivalry<\/td> Control group participants note some trainee group participants go for extra training in order to do well.<\/td><\/tr> History effect<\/td> Some participants in the training group receive creative art training on their new HND module.<\/td><\/tr> Statistical conclusion validity<\/td> Not a misapplication of statistical analysis but no analysis at all!<\/td><\/tr><\/tbody><\/table><\/figure>\n<\/details>\n\n\n\n Exercise 4.2<\/h3>\n\n\n\n
Spotting the confounding variables<\/h4>\n\n\n\n
Show answer<\/summary>\n
Show answer<\/summary>\n
Show answer<\/summary>\n
Show answer<\/summary>\n
Show answer<\/summary>\n
Show answer<\/summary>\n
Show answer<\/summary>\n
\n\n\n\nFurther Information<\/h2>\n\n\n\n
Ecological validity<\/h3>\n\n\n\n
1. The original technical meaning<\/h4>\n\n\n\n
2. The external validity meaning<\/h4>\n\n\n\n
3. The \u2018pop\u2019 version<\/h4>\n\n\n\n
Milgram v. Hofling \u2013 which is more \u2018ecologically valid\u2019?<\/h3>\n\n\n\n
\n
\n\n\n\n
\n\n\n\nWeblinks<\/h2>\n\n\n\n
Validity in psychological research weblinks<\/h3>\n\n\n\n
On this page<\/h2>\n\n\n