Scale Reliability and Validity
Mon 18 Nov, 2024
General reader: Breakwell, Smith & Wright (2012) – Chapter 7 (available via reading list free online)
– Based on Jung’s ideas about personality
– The four dimensions are binary. But most characteristics are normally distributed
– Very poor test-retest reliability.
– Almost no research support.
– Company behind the test CPP makes $20 million a year from it. Has little incentive to start from scratch!
https://www.vox.com/2014/7/15/5881947/myersbriggs-personality-test-meaningless
– Meaning from Greek origin: ‘measuring the soul’
– Psychometrics is the field of study concerned with the theory and technique of psychological measurement, which includes the measurement of knowledge, abilities, attitudes, and personality traits
– Refers to all areas of psychology concerned with psychological measurement (methods of testing and substantive findings)
– Two major research tasks:
– (i) the construction of instruments and procedures for measurement;
– (ii) the development and refinement of theoretical approaches to measurement
– Charles Darwin’s (1809–1882) Origin of the Species impacts scientific thinking in 19 th century
– Evolution (anthropology) combined with quantification (allure of numbers)
– Francis Galton (1822–1911) builds on cousin Darwin’s ideas with measurement and statistics
– Galton developed the theory underpinning correlation and regression
– Used this theory to try to explain the heritability of human ability and achievement (amongst many other things)
– Developed a lab and tests for many concepts e.g. prayer, boredom, beauty
Sample of affect, behaviour, cognition etc
Obtained under standardized conditions
Scored using rules that provide allow for comparison of individuals
Ideally, we would like:
Multiple samples
Multiple situations (contexts, several occasions)
Multiple methods
Often, must measure individuals on
So must use efficient methods
Maximum performance test (can do)
Typical performance test (will do)
– Different answer demands: effort versus candid truth
– Context dependent
Odd one out
Tree, Man, Paper, Mouse
Next in sequence
1, 1, 2, 3, 5, 8…
First 3 form a series,
Which comes next A, B or C ?
Rate on a scale from 1 to 5 how true this is of you
(Costa & McCrae, 1992, Big Five)
Once I find the right way to do something, I stick to it
Dichotomous yes/ no answers
(Eysenck & Eysenck, 1976, Giant 3):
I am the life of a party
Forced choice
(Zuckerman, 1979, Sensation Seeking Scale)
A: I like "wild" uninhibited parties
B: I prefer quiet parties with good conversation
Properties of psychometric tests
Two important properties of psychometric tests
Reliability
–The consistency with which a test measures the construct
Validity
–The degree to which a test actually measures what it claims to measure “accuracy”
A test is valid if it assesses what it claims to measure
The validity of an assessment strategy is the extent to which the strategy yields a reasonably accurate estimation of the characteristic or phenomenon in question.
Many steps to achieve validity (including concurrent validity, predictive validity, construct validity and face validity)
Test retest reliability
– Rule of thumb r between the two test times , 3 months apart > 0.7 (just under 50% agreement)
– Test re-test reliability is not perfect – never reaches 1: beware real changes!
Internal consistency reliability
– Internal consistency is the degree to which all items are measuring the same construct
– Cronbach’s Alpha should be greater than .70 for scales with items > 10
I like to think of them as Consistency and Accuracy
Behavioral observation (observer-rated)
– People scored according to behaviors observed by a rater
– Used frequently in work and clinical settings (e.g. Performance appraisal)
Self-report
– Subjects indicate their level of agreement or preference concerning statements reflecting attitudes or behaviors
– Response distortion is a problem (e.g. faking a personality test)
The raw score on many psychometric tests is based on an arbitrary scale
To give the scores meaning, we compare a person’s scores to a meaningful comparison group
Statistical basis: Normal distribution
Most human traits approximate to normal curve
–Largest number of cases cluster in centre
–Area under curve can be closely specified from mean and standard
Research Methods Lecture 07 - Psychometrics