The First Adaptive Test: Binet's IQ Test

The basic principle of adapting a test to each examinee was recognized in the very early days of psychological measurement, even before the development of the standardized conventional paper-and-pencil test, by Alfred Binet in the development of the Binet IQ test (Binet & Simon, 1905) which later was published as the Stanford-Binet IQ Test. Binet’s test was comprised of sets of test items normed by chronological age level.

Binet’s test administration procedure is a fully adaptive procedure (see example below):

1. It uses a pre-calibrated bank of test items. Binet selected items for each age level if approximately 50% of the children at that age level answered an item correctly. Thus, in the original version of the test, there were sets of items at ages from three years through 11 years. All of these items constituted Binet’s item “bank” for his adaptive test.

2. It is individually administered by a trained psychologist and is designed to “probe” for the level of difficulty (i.e., chronological age) that is most appropriate for each examinee, much as hurdle-jumping probes for the performance level of each athlete.

3. It has a variable starting option. The Binet test is begun by the administrator based on her/his best guess about the examinee’s likely ability level (typically the examinee’s chronological age, but it can be lower or higher if there is information to inform such a starting level).

4. It use a defined scoring method – a set of items at a given age level is administered and immediately scored by the administrator.

5. There is a “branching” or item selection rule that determines which items will next be administered to a given examinee. In the Binet test, the next set of items to be administered is based on the examinee’s performance on each previous set of items. If the examinee has answered some or most of the items at a given age level correctly, usually the items at the next higher age level are administered. If most of the items at a given age level are answered incorrectly, items at the next lower age level are typically administered.

6. There is a pre-defined termination rule. The Binet test is terminated when, for each examinee, both a “ceiling” and a “basal” level have been identified. The ceiling level is the age level at which the examinee incorrectly answers all items; the basal level is the age level at which the examinee correctly answers all the items. The effective range of measurement for each examinee lies between these two levels.

An examinee’s final score on the Binet test is based on the subset of items that she/he answered correctly. In effect, these items are weighted by their age level in arriving at the IQ scores derived from the test, since different examinees will answer both different numbers and subsets of items.

A Schematic Binet Test Administration

Figure 1 illustrates a Binet test administration. The test items are organized into "mental age" levels, which includes items answered correctly by approximately 50% of examinees at each chronological age.

Figure 1. A Schematic Binet Test Administration

In this example, the test began with items at Age 9. The examinee answered items 1, 2, 4, 5, 6, 8, and 10 correctly (+) and items 3, 7, 8, and 9 incorrectly. Of the 10 items administered, therefore, 60% were correctly answered. Because some of the items were answered correctly and others were not, Age 9 was neither a ceiling level (0% correct) nor a basal level (100% correct), so the test continued.

At this point, the examiner could move to items at the next higher or lower age level in an attempt to locate either a ceiling level or a basal level. This examiner chose to search first for a basal level (perhaps to provide the examinee with some positive reinforcement since easier items would likely be answered correctly). Therefore, the test was "branched" to items at Age 8.5 These items were administered and 80% of them were correctly answered. The examiner then continued to search for a basal level by administering the next less difficult set of items at Age 8, of which 90% were correctly answered. The final branching, to Age 7.5 resulted in identification of the basal level -- 100% of these items were correctly answered by the examinee.

Following identification of the basal level, the examiner continued the test by beginning a search for the ceiling level. Because all items at age levels 7.5 through 9 had been administered, the test branched to items at Age 9.5, which was the next more difficult set of unadminstered items. These items were administered and 40% were correctly answered. Because this was not a ceiling level (0% correct), the test continued with the next more difficult set of items (Age 10). The scored responses to these items resulted in 0% correct, establishing the ceiling level.

This example illustrates several characteristics of a Binet adaptive test that are characteristic of most adaptive tests:

1. The starting or entry point for the test can be varied for each examinee. In this case the test could have been started at virtually any age level, based on whatever information the examiner had about the examinee. Had it been started at any age level from 7.5 to 10, the same items would have been administered and the test results would have be unaffected. If the test was started outside this range, additional items would have been administered (thereby lengthening the test) but the scores (which in the Binet test are based on the Mental Age levels of correctly answered items) would have been unaffected. For example, if the test had been started at Age 7, because these were very easy items they presumably would all have been correctly answered, establishing an extra basal level. Similarly, items at Age 10.5 should have resulted in an extra ceiling level since those items would have been more difficult than the items at Age 10.

2. The test is terminated when sufficient information has been obtained about the examinee to determine their level on the trait that underlies the items. In the case of the Binet test, the test is terminated when the items provide no further information about the examinee's ability level. Items below the basal level will be too easy for the examinee and items above the ceiling level will be too difficult -- both of these subsets of items will not provide any further information about the examinee's ability level.

3. The number of items will vary among examinees. A well-designed adaptive test is allowed to continue until sufficient information is available about each examinee to measured the examinee to a predetermined level of precision. In the Binet test, this level of precision is determined by the identification of both a ceiling and basal level, regardless of how many items are required for each examinee. Other adaptive tests use other indicators of precision.

4. Each adaptive test potentially uses a different subset of items from the item bank. An adaptive test is designed to administer, from a precalibrated item bank, the subset of items that best measures each examinee. In the example in Figure 1, that set of items is those items from Age 7.5 to Age 10. Another examinee might receive items from Age 5 to Age 7.5, whereas another might receive items from Age 8 to Age 13.

5. The proportion correct for each examinee across the items administered in an adaptive test will be approximately .50, which is the item difficulty level that provides maximum information about each examinee. In the hypothetical example in Figure 1, the proportion correct across all items was .60. This characteristic of an adaptive test will tend to equalize the psychological "reinforcement environment" of the testing experience across examinees, to the extent that examinees perceive the apparent difficulty of the test items. As a result, lower ability examinees might perceive an adaptive test as easier than conventional tests they have taken, because on conventional tests they might perceive that they were incorrectly answering most items (low percent correct, or a "difficult" test). Conversely, high ability students might perceive an adaptive test as more difficult than conventional tests, because they are used to correctly answering most items on conventional tests (high percent correct, or an "easy" test).

 

Reference

Binet, A., & Simon, Th. A. (1905). Méthode nouvelle pour le diagnostic du niveau intellectuel des anormaux. L'Année Psychologique, 11, 191-244.