Psychometric properties and performance of the Patient Reported Outcomes Measurement Information System® (PROMIS®) depression short forms in ethnically diverse groups.

Jeanne A. Teresi, Katja Ocepek-Welikson, Marjorie Kleinman, Mildred Ramirez, Giyeon Kim


Short form measures from the Patient Reported Outcomes Measurement Information System® (PROMIS®) are used widely. The present study was among the first to examine differential item functioning (DIF) in the PROMIS Depression short form scales in a sample of over 5000 racially/ethnically diverse patients with cancer.  DIF analyses were conducted across different racial/ethnic, educational, age, gender and language groups.

Methods:  DIF hypotheses, generated by content experts, informed the evaluation of the DIF analyses.  The graded item response theory (IRT) model was used to evaluate the five-level ordinal items. The primary tests of DIF were Wald tests; sensitivity analyses were conducted using the IRT ordinal logistic regression procedure.  Magnitude was evaluated using expected item score functions, and the non-compensatory differential item functioning (NCDIF) and T1 indexes, both based on group differences in the item curves. Aggregate impact was evaluated with expected scale score (test) response functions; individual impact was assessed through examination of differences in DIF adjusted and unadjusted depression estimates.

Results:  Many items evidenced DIF; however, only a few had slightly elevated magnitude.  No items evidenced salient DIF with respect to NCDIF and the scale-level impact was minimal for all group comparisons.  The following short form items might be targeted for further study because they were also hypothesized to evidence DIF.  One item showed slightly higher magnitude of DIF for age: nothing to look forward to; conditional on depression, this item was more likely to be endorsed in the depressed direction by individuals in older groups as contrasted with the cohort aged 21 to 49. This item was also hypothesized to show age DIF.  Only one item (failure) showed DIF of slightly higher magnitude (just above threshold) for Whites vs. Asians/Pacific Islanders in the direction of higher likelihood of endorsement for Asians/Pacific Islanders. This item was also hypothesized to show DIF for minority groups.  The impact of DIF was negligible. Conditional on depression, the items, worthless and hopeless were more likely to be endorsed in the depressed direction by respondents with less than high school education vs. those with a graduate degree; the magnitude of DIF was slightly above the T1 threshold, but not that of NCDIF. These items were also hypothesized to show DIF in the direction of more feelings of worthlessness by groups with lower education.  While the magnitude and aggregate impact of DIF was small, in a few instances, individual impact was observed.

Information provided was relatively high, particularly in the middle upper (depressed) tail of the distribution.   Reliability estimates were high (> 0.90) across all studied groups, regardless of estimation method. 

Conclusions:  This was the first study to evaluate measurement equivalence of the PROMIS depression short forms across large samples of ethnically diverse groups.  There were few items with DIF, and none of high magnitude, thus supporting the use of PROMIS depression short form measures across such groups. These results could be informative for those using the short forms in minority populations or clinicians evaluating individuals with the depression short forms.


depression; PROMIS®; differential item functioning; item response theory; ethnic diversity


  • There are currently no refbacks.