Home Student Resources Chapter 4 – Validity in psychological research

Chapter 4 – Validity in psychological research

This chapter investigates issues of experimental validity and links this with the different threats to validity relevant to experiments, in particular, and to all research methods in general.

Exercises

Exercise 4.1

Tabatha and her validity threats

In this chapter of the book there is a description of a rather naff research project carried out by Tabatha. Here it is again. As you read this passage try to identify, and even name if possible, every threat to validity that she has either introduced or failed to control in her design. A list is provided in the answers below.

Tabatha feels she can train people to draw better. To do this, she asks student friends to be participants in her study, which involves training one group and having the other as a control. She tells friends that the training will take quite some time so those who are rather busy are placed in the control group and need only turn up for the test sessions. Both groups of participants are tested for artistic ability at the beginning and end of the training period, and improvement is measured as the difference between these two scores. The test is to copy a drawing of Mickey Mouse. A slight problem occurs in that Tabatha lost the original pre-test cartoon, but she was fairly confident that her post-test one was much the same. She also found the training was too much for her to conduct on her own so she had to get an artist acquaintance to help, after giving him a rough idea of how her training method worked.

Those in the trained group have had ten sessions of one hour and, at the end of this period, Tabatha feels she has got on very well with her own group, even though rather a lot have dropped out because of the time needed. One of the control group participants even remarks on how matey they all seem to be and that some members of the control group had noted that the training group seemed to have a good time in the bar each week after the sessions. Some of her trainees sign up for a class in drawing because they want to do well in the final test. Quite a few others are on an HND Health Studies course and started a module on creative art during the training, which they thought was quite fortunate.

The final difference between groups was quite small but the trained group did better. Tabatha loathes statistics so she decides to present the raw data just as they were recorded. She hasn’t yet reached the recommended reading on significance tests in her RUC self-study pack.

Answers: Possible threats to validity in the study:

Show answer
Name of threatIssue in text
Non-equivalent groupsBusy students go into the control group.
Non-equivalent measuresDifferent Mickey Mouse pre- and post-test; a form of construct validity threat.
Non-equivalent proceduresTraining method not clearly and operationally defined for her artist acquaintance.
AttritionMore participants dropped out of the training group than from the control group.
RivalryControl group participants note some trainee group participants go for extra training in order to do well.
History effectSome participants in the training group receive creative art training on their new HND module.
Statistical conclusion validityNot a misapplication of statistical analysis but no analysis at all!

Exercise 4.2

Spotting the confounding variables

A confounding variable is one that varies with the independent (or assumed causal) variable and is partly responsible for changes in the dependent variable, thus camouflaging the real effect. Try to spot the possible confounding variables in the following research designs. That is, look for a factor that might well have been responsible for the difference or correlation found, other than the one that the researchers assume is responsible. If possible, think of an alteration to the design that might eliminate the confounding factor. Possible factors will be revealed under each example.

A. Participants are given either a set of 20 eight-word sentences or a set of 20 sixteen-word sentences. They are asked to paraphrase each sentence. At the end of this task they are unexpectedly asked to recall key words that appeared in the sentences. The sixteen-word sentence group performed significantly worse. It is assumed that the greater processing capacity used in paraphrasing sixteen words left less capacity to store individual words.

Show answer

Could be the extra time taken by the second task caused greater fatigue or confusion.

B. Male and female dreams were recorded for a week and then analysed by the researcher who was testing the hypothesis that male dream content is more aggressive than female dream content.

Show answer

The researcher knew the expected result, hence researcher expectancy is a possible cause of difference. Solution is to introduce a single blind.

C. People who were fearful of motorway driving were given several sessions of anxiety reduction therapy involving simulated motorway driving. Compared with control participants who received no therapy, the therapy participants were significantly less fearful of motorway driving after a three-month period.

Show answer

There was no placebo group. It could be that the therapy participants improved only because they were receiving attention. Need an ‘attention placebo’ group.

D. After a two-year period depressed adolescents were found to be more obese than non-depressed adolescents and it was assumed that depression was the major cause of the obesity increase.

Show answer

Depression will probably correlate with lowered physical activity and this factor may be responsible. Needs depressed adolescents to be compared with similarly inactive non-depressed adolescents.

E. People regularly logging onto Chat ’n Share, an internet site permitting the sharing of personal information with others on a protected, one-to-one basis, were found to be more lonely after one year’s use than non-users. It was assumed that using the site was a cause of loneliness.

Show answer

Those using the site had less time to spend interacting with other people off-line; need to be compared with people spending equal time on other online activities.

F. Participants are asked to sort cards into piles under two conditions. First they sort cards with attractive people on them, then they sort ordinary playing cards. The first task takes much longer. The researchers argue that the pictures of people formed an inevitable distraction, which delayed decision time.

Show answer

Order effect! The researcher has not counter-balanced conditions. The participants may simply have learned to perform the task faster in the second condition through practice on the first.

G. It is found that young people who are under the age limit for the violent electronic games they have been allowed to play are more aggressive than children who have only played games intended for their age group. It is assumed that the violent game playing is a factor in their increased aggression.

Show answer

This is only a correlation and there may be a third causal variable that is linked to both variables. Perhaps the socio-economic areas in which children are permitted to play under age are also those areas where aggression is more likely to be a positive social norm.


Further Information

An extended discussion of the concept of ecological validity

In Chapter 4 there is a discussion of the much misused and poorly understood concept of ecological validity. This is the original discussion which I trimmed down for the book.

Ecological validity

I attempt here to fully discuss the meaning of this enigmatic and catch-all term ‘ecological validity’ because its widespread and over-generalised use has become somewhat pointless. Hammond (1998) refers to its use as ‘casual’ and ‘corrupted’ and refers to the robbing of its meaning (away from those who continue to use its original sense) as ‘bad science, bad scholarship and bad manners’.

There are three relatively distinct and well used uses of the term, which I shall call ‘the original technical’, ‘the external validity version’ and ‘the pop version’, the latter term to signify that this use I would consider to be unsustainable since it has little to do with validity and its indiscriminate use will not survive close scrutiny.

1. The original technical meaning

Brunswik (e.g., 1947) introduced the term ecological validity to psychology as an aspect of his work in perception ‘to indicate the degree of correlation between a proximal (e.g., retinal) cue and the distal (e.g., object) variable to which it is related’ (Hammond, 1998). This is a very technical use. The proximal stimulus is the information received directly by the senses – for instance two lines of differing lengths on our retinas. The distal stimulus is the nature of that actual object in the environment that we are receiving information from. If we know that the two lines are from two telegraph poles at different distances from us we might interpret the two poles as the same size but one further away than the other. The two lines have ecological validity in so far as we know how to usefully interpret them in an environment that we have learned to interpret in terms of perspective cues. The two lines do not appear to us as having different lengths because we interpret them in the context of other cues that tell us how far away the two poles are. In that context their ecological validity is high in predicting that we are seeing telegraph poles. More crudely, brown patches on an apple are ecologically valuable predictors of rottenness; a blue trade label on the apple tells us very little about rot. 

2. The external validity meaning

Many textbooks, including this one, have taken the position that ecological validity is an aspect of external validity and refers to the degree of generalisation that is possible from results in one specific study setting to other different settings. This has usually had an undertone of comparing the paucity of the experimental environment with the greater complexity of a ‘real’ setting outside the laboratory. In other words researchers asked ‘how far will the results of this laboratory experiment generalise to life outside it?’ The general definition, however, has concerned the extent of generalisation of findings from one setting to another and has allowed for the possibility that a study in a ‘real life’ setting may produce low ecological validity because its results do not generalise to any other setting – see the Hofling study below. Most texts refer to Bracht and Glass (1968) as the originators of this sense of the term and the seminal work by Cook and Campbell (1979) also supported this interpretation.

On this view effects can be said to have demonstrated ecological validity the more they generalise to different settings and this can be established empirically by replicating studies in different research contexts.

3. The ‘pop’ version

The pop version is the definition very often taught on basic psychology courses. It takes the view that a study has (high) ecological validity so long as the setting in which it is conducted is ‘realistic’, or the materials used are ‘realistic’, or indeed if the study itself is naturalistic or in a ‘natural’ setting (e.g., Howitt, 2013). The idea is that we are likely to find out more about ‘real life’ if the study is in some way close to ‘real life’, begging the question of whether the laboratory is not ‘real life’.

The problem with the pop version is that it has become a knee-jerk mantra – the more realistic the more ecological validity. There is, however, no way to gauge the extent of this validity. It is just assumed, so much so that even A-level students are asked to judge the degree of ecological validity of fictitious studies with no information given about successful replications or otherwise.

Teaching students that ecological validity refers to the realism of studies or their materials simply adds a new ‘floating term’ to the psychological glossary that is completely unnecessary since we already have the terminology. The word to use is ‘realism’. As it is, students taught the pop version simply have to learn to substitute ‘realism’ when they see ‘ecological validity’ in an examination question.

For those concerned about the realism of experimental designs Hammond (1998) points out that Brunswick (1947) introduced another perfectly suitable term. He used representative design to refer to the need to design experiments so that they sample materials from among those to which the experimenter wants to generalise effects. He asked that experimenters specify in their design the circumstances to which they wished to generalise. For instance, in a study on person perception, in the same way as we try to use a representative sample of participants, we should sample a representative sample of stimulus persons (those whom participants will be asked to make a judgment about) in order to be able to generalise effects to a wider set of perceived people. Hammond is not the only psychologist worried about the misuse of Brunswik’s term. Araújo, Davids and Passos (2007) argue that the popular ‘realism’ definition of ecological validity is a confusion of the term with representative design:

‘… ecological validity, as Brunswik (1956) conceived it, refers to the validity of a cue (i.e., perceptual variable) in predicting a criterion state of the environment. Like other psychologists in the past, Rogers et al. (2005) confused this term with another of Brunswik’s terms: representative design.’ (p.69)

This article by Araújo et al is a good place to start understanding what Brunswik actually meant by ecological validity and demonstrates that arguments to haul its meaning back to the original are contemporary and not old-fashioned. The term is in regular use in its original meaning by many cognitive psychologists. They are not clinging to a ‘dinosaur’ interpretation in the face of unstoppable changes in the evolution of human language.

Milgram v. Hofling – which is more ‘ecologically valid’?

Another problem with the pop version is that it doesn’t teach students anything at all about validity as a general concept. It simply teaches them to spot when material or settings are not realistic and encourages them to claim that this is a ‘bad thing’. It leads to confusion with the laboratory–field distinction and a clichéd positive evaluation of the latter over the former. For example, let’s compare Milgram’s famous laboratory studies of obedience with another obedience study by Hoflinget al (1966), where nurses working in a hospital, unaware of any experimental procedure, were telephoned by an unknown doctor and broke several hospital regulations by starting to administer, at the doctor’s request, a potentially lethal dose of an unknown medicine. The pop version would describe Hofling’s study as more ‘ecologically valid’ because it was carried out in a naturalistic hospital setting on real nurses at work. In fact, this would be quite wrong in terms of external validity since the effect has never been replicated. The finding seems to have been limited to that hospital at that time with those staff members. A partial replication of Hofling’s procedures failed to produce the original obedience effect (Rank and Jacobson, 1977[1]), whereas Milgram’s study has been successfully replicated in several different countries using a variety of settings and materials. In one of Milgram’s variations, validity was demonstrated when it was shown that shifting the entire experiment away from the university laboratory and into a ‘seedy’ downtown office, apparently run by independent commercial researchers, did not significantly reduce obedience levels. Here, following the pop version, we seem to be in the ludicrous situation of saying that Hofling’s effect is valid even though there is absolutely no replication of it, while Milgram’s is not, simply because he used a laboratory! In fact Milgram’s study does demonstrate ecological validity on the generalisation criterion. The real problem is that there is no sense of ‘validity’ in the pop notion of ecological validity.

In a thorough discussion of ecological validity Kvavilashvili and Ellis (2004) bring the original and external validity usages together by arguing that both representativeness and generalisation are involved, with generalisation appearing to be the more dominant concept. Generalisation improves the more that representativeness is dealt with. However, they argue that a highly artificial and unrealistic experiment can still demonstrate an ecologically valid effect. They cite as an example Ebbinghaus’s memory tasks with nonsense syllables. His materials and task were quite unlike everyday memory tasks but the effects Ebbinghaus demonstrated could be shown to operate in everyday life, though they were confounded by many other factors. The same is true of research in medicine or biology; we observe a phenomenon, make highly artificial experiments in the laboratory (e.g., by growing cultures on a dish) then re-interpret results in the everyday world by extending our overall knowledge of the operation of diseases and producing new treatments. In psychology, though, it is felt that by making tasks and settings more realistic we have a good chance of increasing ecological validity. Nevertheless, ecological validity must always be assessed using research outcomes and not guessed at because a study is ‘natural’.

I think the conclusions that emerge from this review of the uses of ecological validity are that:

  • Examiners (public or institutional) should certainly not assess the term unless they are prepared to state and justify explicitly the specific use they have in mind prior to any examinations.
  • The pop version tells us nothing about formal validity and is a conceptual dead end; ‘realism’ can be used instead and ‘ecological validity’ takes us no further.
  • Rather than ‘ecological validity’ it might be more accurate to use the term ‘external validity concerning settings’. Although in 1979 Cook and Campbell identified ecological validity with generalisation to other settings (i.e., external validity), in their update of their 1979 classic, Shadish, Cook and Campbell (2002) talk of external validity with regard to settings. They seem to pass the original term back to Brunswik saying that external validity is often ‘confused with’ ecological validity. By contrast, Kvavilashvili and Ellis (2004) argue that ‘the difference between the two concepts is really small’. Obviously we cannot hope for pure agreement among academics!
  • It is ridiculous to assume that on the sole basis that a study is carried out in a natural setting or with realistic materials it must be in some way more valid than a laboratory study using more ‘artificial’ materials. Validity is about whether the effect demonstrated is genuinely causal and universal. An effect apparently demonstrated in the field can easily be non-genuine and/or extremely limited, as was Hofling’s.
  • The pop version cannot be sustained scientifically and is not of much use beyond being a technical sounding substitute for the term ‘realism’. The original version is still used correctly by those working in perception and related fields. The external validity (generalising) version is favoured by Kvavilashvili (over representativeness), and directs attention to a useful aspect of validity in the design of research. However, the external validity version is challenged by authors such as Hammond (1998) and Araújo et al (2007), who claim that this is not at all what Brunswik meant nor is it the way cognitive psychologists use the term. Perhaps it’s better to lie low, use alternative terminology, and see how the term evolves. I rather sense that the pop version will hang around, as will complete misunderstanding of the terms ‘null hypothesis’ and ‘negative reinforcement’.

[1] Unlike in Hofling’s study, nurses were familiar with the drug and were able to communicate freely with peers.


An article on Clever Hans – the horse that could “count” – explained by confounding:

https://www.damninteresting.com/clever-hans-the-math-horse

Articles on the replication crisis:

https://www.science.org/doi/epdf/10.1126/science.349.6251.910

https://onlinelibrary.wiley.com/doi/full/10.1111/dmcn.14054

Hammond’s article on the hijacking of the term “ecological validity”:

www.brunswik.org/notes/essay2.html