Internal consistency has been separated from the other two types of reliability because it is fundamentally different than the other two types of reliability: this kind of reliability is established on the basis of a single test administration. It shows how consistent scores are within a single test. Basically, it is taking a look at whether the test agrees with itself, rather than another measure. This can be especially useful for a test with many questions or trials. Good internal consistency usually indicates that your questions are measuring the same thing, and that any unreliability you have found are not due to fluctuations during the test by the participants, or to observers, and thus must be due to standardization problems in test administration.
There are 3 topics of note in the internal consistency section: (a) Average Inter-Item Correlations, (b) Split-half reliability, and (c) Cronbach's Alpha, Kuder-Ricahrdson method. Each of these sections will be explored individually.
This involves computing correlations between the answers to each question and the answers to each other questions. For example, on a 5 question test, Q1 would be correlated with Q2, Q3, Q4, and Q5, individually, Q2 would be correlated with Q1, Q3, Q4, and Q5, and so on for each question. After all of these answers have been correlated with one another, the correlation is averaged, giving the "Average Inter-Item Correlation." This tells you how much the answers to each question correspond with one another.
This is the most common type of internal consistency measure. The test is arbitrarily divided in half (For example, the first half of the test vs. the second half of the test; or maybe odd numbered questions vs. even numbered questions) after the participant has answered all of the questions. Then each half is correlated with the other. This gives an estimate of how consistent results are within the test. Some things to note though:
Each half has to be exactly the same in what it is measuring
As mentioned earlier, the more questions a test has, the greater its reliability. The split-half technique, by halving the number of questions, makes it so that the reliability coefficient found in the split-half calculation is not as high as the actual reliability of the results. To fix this, researchers use the Spearman-Brown formula, which is:
(2 x r ) / (1 + r )
where r is the originally obtained split-half reliability
Although at this point in the course you may not actually be producing correlations, it is important to know that such a formula exists and the reason it is used.
These are advanced formulas that operate on principles similar to those of split-half reliability. Because the split-half correlation for odd vs. even numbered questions might be different than that for first half vs. second half questions, the Cronbach's Alpha and Kuder-Richardson method are statistical formulas that calculate all of the possible split-half correlations and average them, accounting for the possibility of very different results in any one split half combination that might not be noticed with only one split-half computed. The difference between the two is that the Kuder-Richardson method is used for questions for which the response is binary (there are only two options).
Internal consistency reliability is useful especially for IQ, other cognitive tests, or self-report measures of performance, all of which likely involve many trials. This is because using internal consistency reliability allows the researcher to avoid having to retest or provide excess measures to gain a reliability estimate. Should also be used only on measures in which there are multiple questions measuring a single construct.
Before moving on, try taking this short quiz to make sure that you have a firm grasp on the Types of Reliability. Just Click on the picture below. Once you're sure you understand the concepts in the quiz, you can move on in this unit.
It is important to know when to use each of these types of reliability, and understand the difference, especially when reading research in psychology.
Reliability coefficients can mean different things depending on what is being measured. For example, although a test-retest coefficient of 0.7 - 0.8 is in most forums acceptable, in certain areas (such as the fingerprint article in the "Why Should We Ensure Reliability" Section), reliability must be very high to ensure there is very little error.
These different types of reliability can be complementary, meaning that you can calculate test-retest or equivalent forms reliability as well as internal consistency. For example, if test-retest correlations are low, one can calculate internal consistency and equivalent forms reliability to try and discover where the source of unreliability lies.