Our mission is to foster the improvements in the analysis of health related phenomena among older minority; encourage the development of methods and measures that better capture the health and determinants of health of diverse elders; promote collaboration between RCMAR sites on analysis issues; and disseminate new knowledge in this area.



Prepared by: Center for Health Improvement for Minority Elders (CHIME), UCLA

Item response theory (IRT) models are often used to investigate measurement invariance. Differential item functioning (DIF) is said to occur when the relationship between theta and item score is different across two or more studied populations. This is particularly important to assess when using scales to determine disparities between different groups of older adults.

The paper demonstrates that a simulated computerized adaptive test of only 5 items reproduced very accurately scores based on all 47 items in the item pool.

  • Embertson, S. E., Reise, S. P. (2000). Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.
Intermediate-level book on IRT models and methods. Includes a good discussion of DIF.

This didactic paper provides a fantastic soup to nuts overview of IRT.

This article describes an ordinal logistic regression technique to assess DIF and illiustrates how IRT scoring diminished the impact of DIF. The advantages of the technique relative to other approaches is discussed.

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications.
This is an excellent introduction to all aspects of IRT models and methods that includes many fully worked examples. It discusses DIF testing and computerized adaptive testing.

This didactic paper illustrates the application of IRT methods to a variety of measurement issues. It covers all the basic issues in IRT measurement models.

DIF in terms of time frame (7-day and 4-week) on the Functional Assessment of Chronic Illness Therapy-Fatigue was examined in a sample of 216 cancer patients. No item displayed DIF between time frames. The 7-day time frame was recommended because this yielded more information than the 4-week time frame.

  • Lai, J. S., Teresi, J., & Gershon, R. (2005). Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Evaluation and the Health Professions, 28, 283-294.
The paper examines methods of examining DIF that can be used with relatively small sample sizes (< 200 cases).

A classic ADL measure showed DIF by age but DIF was lower in magnitude and balances out in an expanded measure that included IADL items.

Analyses were conducted on 4,499 older individuals who completed the CES-D. Hispanics “under responded” to items assessing positive affect relative to non-Hispanic whites. DIF was also observed in responses over time for the Hispanic subgroup.

Example of paper examining DIF in English and Spanish versions of a patient satisfaction scale using standard IRT methods.

Didactic paper that examines DIF in English and Chinese samples using confirmatory factor analyses and IRT, drawing out similarities and differences between the two approaches. Well written and easy to understand.

The authors advocate external reviews of items to help empirical DIF results. The article presents a case study using blinded reviewers along with survey translation-related DIF results.

  • Teresi, J. A. (2001). Statistical methods of examination of differential item functioning with applications to cross-cultural measurement of functional, physical and mental health. J Mental Health and Aging,7, 31-40.
A great exposition of a variety of approaches to the assessment of DIF.

DIF analyses on the PROMIS depressive symptoms measure for gender, age and education were performed on a sample of 735 individuals..Likelihood ratio tests and magnitude of DIF were assessed. Problematic items identified included “I felt like crying,” “I had trouble enjoying things that I used to enjoy,”and “I felt I had no energy.” The overall magnitude and impact of DIF was small for the groups studied, but impact was relatively large for some individuals in the sample.

DIF of the Academic Medical Center Linear Disability Scale was evaluated in a multicenter study of 1,283 inpatients and outpatients. Eighteen of 72 items show significant DIF between two groups (neurological versus internal medicine patients). However, DIF could be accounted for by inclusion of disease-specific parameters.

DIF analyses for the CES-D by gender, age and race/ethnicity were conducted in a sample of 2,773 community-dwelling older adults. A bi-factor multiple indicator and multiple causes (MIMIC) model was used to assess DIF. Blacks were more likely than whites to endorse items such as “People dislike me” and “People are unfriendly.” Men were more likely than women to endorse “I felt like a failure” while women were more likely than men to endorse the “crying” item. Those less than 75 years old were more likely than older respondents to endorse the “I feel like a failure.” However, gender and age DIF was trivial in size.

Last updated July 2010