Diagnostic validity is an especially important psychometric characteristic to consider when evaluating the quality and usefulness of a test or screening instrument. It refers to an instrument’s accuracy in predicting group membership (e.g., ASD versus non-ASD). Diagnostic validity can be expressed through metrics such as sensitivity and specificity, and positive predictive value (PPV) and negative predictive value (NPV). Sensitivity and specificity are measures of a test's ability to correctly identify someone as having a given disorder or not having the disorder. Sensitivity refers to the percentage of cases with a disorder that screen positive. A highly sensitive test means that there are few false negative results (individuals with a disorder who screen negative), and thus fewer cases of the disorder are missed. Specificity is the percentage of cases without a disorder that screen negative. A highly specific test means that there are few false positive results (individuals without a disorder who screen positive). False negatives decrease sensitivity, while false positives decrease specificity. An efficient assessment tool should minimize false negatives as these are individuals with a likely disorder who remain unidentified. Sensitivity and specificity levels of .80 or higher are generally recommended.
Positive Predictive Value (PPV) and Negative Predictive Value (NPV) are also important validity statistics that describe how well a screening tool or test performs. The probability of having a given disorder, given the results of a test, is called the predictive value. PPV is interpreted as the percentage of all positive cases that truly have the disorder. PPV is a critical measure of the performance of a diagnostic or screening measure, as it reflects the probability that a positive test or screen identifies the disorder for which the individual is being evaluated or screened. NPV is the percentage of all cases that screened negative that are truly without the disorder. The higher the PPV and NPV values, the better the instrument at correctly identifying cases. It is important to recognize that PPV is determined by the sensitivity and specificity of the test and the prevalence of disorder in the population being tested. For example, an ASD-specific screening measure may be expected to have a higher PPV when utilized with a known group of high-risk children who exhibit signs or symptoms of developmental delay, social skills deficits, or language impairment. In fact, for any diagnostic test, when the prevalence of the disorder is low, the positive PPV will also be low, even using a test with high sensitivity and specificity.
Practitioners should carefully review the psychometric properties of assessment tools and select those with high sensitivity and PPV values. For example, rating scales such as the Autism Spectrum Rating Scales (ASRS) and Social Communication Questionnaire (SCQ) have, on average, high sensitivity and PPV, while instruments such as the Gilliam Autism Rating Scale (GARS) underestimate the likelihood of children with autism being classified as having autism, indicating poor sensitivity.
Lecavalier L. (2005). An evaluation of the Gilliam Autism Rating Scale. Journal of Autism and Developmental Disorders, 35, 795-805.
Norris, M. & Lecavalier, L. (2010). Screening accuracy of level 2 autism spectrum disorder rating scales: A review of selected instruments. Autism, 14, 263-284.
Wilkinson, L. A. (2010). A best practice guide to assessment and intervention for autism and Asperger syndrome in schools. London: Jessica Kingsley Publishers.
© Lee A. Wilkinson, PhD