What Are The Characteristics of a Good Test?Assessment Services Review
should follow have been highlighted in the International Guidelines for Test 3. Foxcroft: Ethical Issues Related to Psychological Testing in Africa Foxcroft, C. D., Watson, A. S. R., Greyling, J., & Streicher, M. (, July). Psychological testing is the administration of psychological tests, which are designed to be "an A useful psychological test must be both valid (i.e., there is evidence to can be interpreted in a norm-referenced or criterion-referenced manner. intellectual abilities, and Bayley-III which tests mental ability and motor skills. How many tests are necessary, how many replications must be conducted, and For example, autism is diagnosed by three behavioral criteria, in which the stronger the model, and the more useful it will be for meeting these two goals. to the acoustic startle response (ASR) and prepulse inhibition of the ASR in the.
This article summarizes the current strategies for optimizing the validity of a mouse model of a human brain dysfunction. What are the criteria necessary to define the model organism? Which assays are most appropriate for phenotyping the disease model? How many tests are necessary, how many replications must be conducted, and which controls are essential? In the case of neuropsychiatric disorders, which behavioral assays are sufficiently analogous to the behavioral symptoms of the human syndrome?
This overview discusses the basic concepts inherent in phenotyping animal models of human neuropsychiatric disorders. Three criteria are commonly used to validate an animal model.
Mutant mice with a targeted mutation in a gene implicated in a neuropsychiatric disorder have reasonable construct validity for that inactivation or polymorphism of the human gene. Neuroanatomical lesions, prenatal drug exposures, and environmental toxins offer other examples of putative causes of human diseases that can be replicated in animal models. For example, a mouse model of schizophrenia could test the hypothesis that the gene COMT confers susceptibility to schizophrenia by knocking out the COMT gene in the mouse genome [ Babovic et al.
Behavioral symptoms, neuroanatomical pathology, neurophysiological responses, and neurochemical abnormalities are examples of disease components or endophenotypes that can be modeled in animals. Endophenotypes are single behavioral, anatomical, biochemical, and neurophysiological markers for a given disease.
The temporal progression of a neurodevelopmental or neurodegenerative disease is approximated in the animal model by repeating assays to generate a longitudinal profile at appropriate ages. For example, autism is diagnosed by three behavioral criteria, in which aberrant reciprocal social interaction is the primary diagnostic symptom.
Criteria for Validating Mouse Models of Psychiatric Diseases
Our automated three chambered social approach task assays aspects of sociability in mice that are most relevant to the first diagnostic symptom of autism, and can be used repeatedly in the same animals for longitudinal analyses of neurodevelopmental models [ Moy et al. A specific class of drugs that ameliorates the human symptoms should reverse the traits in the animal model.
Classes of drugs that are ineffective in the human syndrome must similarly be ineffective in the animal model. For example, rodent models of depression rely on antidepressant drug reversal of immobility in the tail suspension and Porsolt forced swim tasks, which involve inescapable stressors [ Porsolt et al.
Criteria for Validating Mouse Models of Psychiatric Diseases
Two major goals of animal models are 1 testing hypotheses about the mechanisms underlying the disease, and 2 translational evaluation of pharmacological, behavioral, and other treatments for the disease.
The more similarities in construct, face, and predictive validity between the animal model and the human disease, the stronger the model, and the more useful it will be for meeting these two goals.
Further criteria include quantitative measures that are amenable to standard statistical analyses, methodologies that can be readily applied by many laboratories, and robust traits that are easily detectable above background variability. More importantly, results will have to be reproducible in replications across cohorts of animals in the same laboratory, and in different laboratories across geographic locations.
A highly valid behavioral phenotype of a targeted gene mutation must replicate in three independent cohorts of mice from several generations of the mutant mouse line, and in the same line tested in other laboratories. Transgenic mice, which may have a new gene added or an existing gene overexpressed, and knockout mice, in which there is a loss of function of a gene through deletion or mutation such that the protein is not correctly synthesized, have been developed for many neurotransmitters, receptors, second messengers, transporters, and transcription factors.
Conditional and inducible promoters, knock-ins of humanized gene polymorphisms, and microinjections of viral vectors containing genes and RNA interference sequences into neuroanatomical locations provide further elegant research tools. Results from these various categories of mutant mouse models are leading to a better understanding of the neurological underpinnings of behavior, and the proximal causes of human genetic disorders.
Performance tests on the other hand minimize the use of language; they can involve solving problems that do not involve language. They may involve manipulating objects, tracing mazes, placing pictures in the proper order, and finishing patterns, for example. This distinction is most commonly used in the case of intelligence tests, but can be used in other ability tests as well. Performance tests are also sometimes used when the test-taker lacks competence in the language of the testing.
Many of these tests assess visual spatial tasks. Historically, nonverbal measures were given as intelligence tests for non-English speaking soldiers in the United States as early as World War I.
These tests continue to be used in educational and clinical settings given their reduced language component.
Different cognitive tests are also considered to be speeded tests versus power tests. A truly speeded test is one that everyone could get every question correct if they had enough time. Some tests of clerical skills are exactly like this; they may have two lists of paired numbers, for example, where some pairings contain two identical numbers and other pairings are different. The test-taker simply circles the pairings that are identical.
Pure power tests are measures in which the only factor influencing performance is how much the test-taker knows or can do. A true power test is one where all test-takers have enough time to do their best; the only question is what they can do.
Obviously, few tests are either purely speeded or purely power tests. Most have some combination of both. For example, a testing company may use a rule of thumb that 90 percent of test-takers should complete 90 percent of the questions; however, it should also be clear that the purpose of the testing affects rules of thumb such as this.
Few teachers would wish to have many students unable to complete the tests that they take in classes, for example. When test-takers have disabilities that affect their ability to respond to questions quickly, some measures provide extra time, depending upon their purpose and the nature of the characteristics being assessed.
Page 92 Share Cite Suggested Citation: In educational and intelligence tests, recognition tests typically include multiple-choice questions where one can look for the correct answer among the options, recognize it as correct, and select it as the correct answer. One must recall or solve the question without choosing from among alternative responses.
This distinction also holds for some non-cognitive tests, but the latter distinction is discussed later in this section because it focuses not on recognition but selections.
For example, a recognition question on a non-cognitive test might ask someone whether they would rather go ice skating or to a movie; a free recall question would ask the respondent what they like to do for enjoyment. Cognitive tests of various types can be considered as process or product tests. Take, for example, mathematics tests in school.
In some instances, only getting the correct answer leads to a correct response. In other cases, teachers may give partial credit when a student performs the proper operations but does not get the correct answer. Similarly, psychologists and clinical neuropsychologists often observe not only whether a person solves problems correctly i.
Test Administration One of the most important distinctions relates to whether tests are group administered or are individually administered by a psychologist, physician, or technician. Tests that traditionally were group administered were paper-and-pencil measures. Often for these measures, the test-taker received both a test booklet and an answer sheet and was required, unless he or she had certain disabilities, to mark his or her responses on the answer sheet.
In recent decades, some tests are administered using technology i.
- What Are The Characteristics of a Good Test?
- Psychological testing
There may be some adaptive qualities to tests administered by computer, although not all computer-administered tests are adaptive technology-administered tests are further discussed below. An individually administered measure is typically provided to the test-taker by a psychologist, physician, or technician.
More faith is often provided to the individually administered measure, because the trained professional administering the test can make judgments during the testing that affect the administration, scoring, and other observations related to the test.
Tests can be administered in an adaptive or linear fashion, whether by computer or individual administrator.
Looking for other ways to read this?
A linear test is one in which questions are administered one after another in a pre-arranged order. Typically, if the test-taker is answering the first questions correctly or in accordance with preset or expected response algorithms, for example, the next questions are still more difficult until the level appropriate for the examinee performance is best reached or the test is completed. If one does not answer the first questions correctly or as typically expected in the case of a non-cognitive measure, then easier questions would generally be presented to the test-taker.