Psychometric properties of questionnaires evaluating health-related quality of life and functional status in polytrauma patients with lower extremity injury

Background Long term disability is common among polytrauma patients. However, as yet little information exists on how to adequately measure functional status and health-related quality of life following polytrauma. Aims To establish the unidimensionality, internal consistency and validity of two health-related quality of life measures and one functional status questionnaire among polytrauma patients. Methods 186 Patients with severe polytrauma including lower extremity injury completed the Sickness Impact Profile-136 (SIP-136), the Medical Outcomes Study 36-Item Short Health Survey (SF-36) and the Groningen Activity Restriction Scale (GARS) 15 months after injury. Unidimensionality and internal consistency was assessed by principal components analysis and Cronbach's alpha (α). To test the construct validity of the questionnaires, predetermined hypotheses were tested. Results The unidimensionality and internal consistency of the GARS and the SF-36, but not the SIP-136 were supported. The construct validity of the SF-36, GARS and to a lesser extent the SIP-136 was confirmed. Conclusion The SF-36 and the GARS appear to be preferable for use in polytrauma patients over the SIP-136.


Introduction
People who sustain traumatic injury do not generally regain their pre-injury levels of physical functioning and experience difficulty in performing activities of daily living (ADL) [1,2]. Previous studies have suggested that the lower extremities are the most frequently injured body regions in polytrauma patients [3]. In addition, injuries of the lower extremities are believed to have a major impact on functional status and health-related quality of life (HRQoL) [3,4].
The Sickness Impact Profile-136 (SIP-136) [5] and the Medical Outcomes Study 36-Item Short Health Survey (SF-36) [6] are widely used measures of HRQoL and have been used in populations with a wide range of diagnoses and disease severity including trauma care [1,2]. The Groningen Activity Restriction Scale (GARS) [7,8] is a questionnaire, which measures patient's functional limitations of socially defined roles and tasks and is used in various countries in different populations. However, any instrument that is used to assess patients should have adequate psychometric properties and be appropriate for the patient population assessed. The reliability and validity of these questionnaires in trauma patients has not yet been established.
The aim of the present study was to investigate: 1) the unidimensionality, 2) internal consistency and 3) construct validity of the SIP-136, SF-36 and GARS among polytrauma patients with at least one injury of the lower extremity.

Participants
The present study used data from a large prospective cohort study designed to examine multiple outcomes after traumatic injury [9]. Four hundred and ninety-nine consecutive severely injured patients who entered and stayed in the Hospital were considered for participation in this study. From this group, children below the age of 16 years (n = 40) were excluded, as were patients who died before the final assessment of outcome (n = 100). Of the 359 eligible patients, 335 gave informed consent and participated in the prospective cohort study. Twenty-four patients were lost to follow-up. Reasons for withdrawal from the study were: three lived abroad, seven addresses were untraceable, eleven patients withdrew their consent and three had an incomplete dataset.
For the present study patients were included if they had: 1) a fracture or injury of the lower extremity (including pelvis) with a Hospital Trauma Index (HTI) [10] of two or more, 2) an Injury Severity Score (ISS) of 16 or above and 3) age of 17 years or older and 4) were able to write and read Dutch. Individuals that had suffered spinal cord injury were excluded. This resulted in 186 patients being included in the analyses for the present study.

Measures
Approximately 15 months after injury, patients completed the SIP-136, SF-36 and GARS.
The SF-36 [6] includes 36 multiple-choice items and takes approximately five to ten minutes to complete. The SF-36 is grouped into eight subscales scores: Physical and Social Functioning, Role Limitations due to Physical Health and Emotional Problems, Mental Health, Bodily Pain, Vitality and General Health. All raw scale scores are linearly converted to a 0 to 100 scale, with higher score indicating higher levels of functioning or well-being.
The SIP-136 [5] is a standardized questionnaire consisting of 136 (yes/no) statements about health-related dysfunction. The 136 items are grouped into 12 different categories: Ambulation, Mobility, Body Care and Movement, Social Interaction, Alertness, Emotional Behaviour, Behaviour, Communication, Sleep and Rest, Eating, Work, Home Management, Recreation and Pastimes. Each item is assigned a predetermined weight. Scores are calculated by summing the weights of all health related items and dividing by the maximum possible dysfunction score for each category. Scores are expressed as percentages, ranging from zero (no impairment) to 100 (maximum impairment).
Physical functioning was assessed by the GARS [7,8]. The GARS consists of 18 questions about daily activities, each with four response categories. Response choices range from 1) 'yes, I can do it fully and independently without any difficulty' to 4) 'no I cannot do it without someone's help'. The questionnaire comprises 11 items (scale range 11-44) referring to Activities in Daily Living (ADL, personal care) and seven items (scale range 7-28) to Instrumental Activities of Daily Living (IADL, household chores). The sum score provides information on the level of difficulty a person experiences in care taking and household activities. Sum scores may range from 18 (no disability) to 72 (maximum disability).
The following additional variables were assessed; age, gender, length of hospital stay, length of stay in Intensive Care Unit, discharge destination, pain and co-morbidity.

Statistical analyses
To test the unidimensional structure of the subscales of the questionnaires, principal components analysis was applied. Items in the principal components analysis with loading less than 0.4 were considered inadequately representative of the underlying dimension. Analyses were performed separately per subscale. In the analysis of the SIP-136, three items were excluded because no patients scored positive on these items. To further explore the correlation between the subscales of the SIP-136 and SF-36 and GARS, the Spearman rank correlations between the subscale scores were calculated.
Internal consistency was investigated for all subscales of the SIP-136, SF-36 and GARS by calculating Cronbach's alpha (α).
Construct validity refers to the extent to which scores on a particular instrument relate to other measures in a manner that is consistent with theoretically derived hypotheses concerning the constructs that are being measured. To test the construct validity of the physical subscales of the SIP-136, SF-36 and GARS the following hypotheses were tested: HRQoL and functional status will be worse for 1) older patients, 2) patients with a longer hospital stay, 3) patients with a longer ICU stay, 4) patients who are discharged to other institutions instead of going home and 5) patients who experience more pain. Odds ratios (OR) and their 95% confidence intervals (CIs) were calculated using logistic regression. In all regression analyses, age and co-morbidity were included as control variables.

Results
The characteristics of the study population are presented in table 1. Table 2 contains the mean, standard deviation, median, score range and proportion minimum and maximum score (floor and ceiling effect) of the (sub) scales of the questionnaires. All domains of the SIP-136 were skewed towards higher (worse) values. Large percentages of patients scoring the minimum (best) score indicating a floor effect. Fewer ceiling effects were measured with the SF-36 than for the SIP-136. Four scales of the SF-36 were skewed towards lower (worse) values with relatively large numbers scoring the maximum value (ceiling effect) on three of those four scales. The GARS Total, ADL as well as the IADL were skewed towards higher (worse) values with large percentages of patients scoring the minimum (best) score.
Principal components analysis confirmed that the SF-36 and GARS were unidimensional. The domains of the SF-36 showed eigenvalues ranging from 1.6 to 6.7. The proportions of variance accounted for ranged from 51% to 89%. All items loaded adequately on their respective subscales (loading > 0.40). For the ADL scale of the GARS, all 11 items loaded on one component (eigenvalue of 6.82, percentage of variance accounted for was 62%). For the IADL score, also all items loaded on one component (eigenvalue of 5.3, percentage explained variance was 75.5%).
The unidimensionality of the SIP-136 was not supported. The 12 domains of the SIP-136 showed eigenvalues ranging from 1.9 to 6.0. The percentages of variance accounted for ranged from 24.6-43.2. Only three subscales had a factor loading > 0.40 of all items, whereas the other nine subscales had one or more (ranging from 1-12) items which were not loading on the scale (< 0.40).
Seven of the twelve subscales of the SIP-136 had a Cronbach's α higher than 0.70. The other five subscales had relatively small number of constituent items in a scale. All subscales from the SF-36 exceed the minimum required value of 0.70 for group comparison. The Cronbach's α of the total score of the GARS and the two subscales of the GARS range from 0.94-0.96.
To test construct validity forty logistic regressions models were computed, controlling for age and co-morbidity. Thirty-eight hypotheses were supported (p < 0.05) (table 3). The two not supported hypotheses concerned the SIP-136.

Discussion
The results of our study support the reliability (unidimensionality, internal consistency) and construct validity of the SF-36 and GARS in a polytrauma population with lower extremity injury. Whereas the construct validity of the SIP-136 in this population was supported, the unidimensionality and internal consistency of the subscales are not supported in the present study.
The analysis of the GARS showed that the ADL and IADL scales can be used as separate (unidimensional) scales but the strong association between the two scales indicated that the scales do not measure different aspects of functional outcome. Other studies [7,11,12] also suggest one strong and reliable factor representing one underlying dimension of functional limitations.
Our study raises questions on the unidimensionality of most subscales of the SIP-136, suggesting that these subscales are not appropriate for use in a polytrauma population with injury of the lower extremity.
The internal consistency of the SIP-136 in the present study was low for most subscales. To our knowledge, little information about the internal consistency of the 12 separate scales of the SIP-136 is available in the literature. One study was found that reports sufficiently high Cronbach's α for the separate categories [13], while two other studies assessing the Cronbach's α of the subscales of the SIP-136 reported low Cronbach's [14,15]. High Cronbach's α from the total SIP-136 [5,13,16] and the physical and psychological dimension scores are reported [13]. However, Cronbach's α is dependent on the number of items in a questionnaire, a high α coefficient of the sum scores of the SIP-136 is therefore not surprising and not informative.
Our findings supported the construct validity of the SF-36 and GARS, these findings are comparable with the literature [3,11,17,18]. In our patient group, the construct  validity of the SIP-136 was supported to a lower extent. Ho et al [16] found an advantage in using the SF-36 above the SIP because of its more robust construct validity, while others found some evidence to support the construct validity of the SIP [13,15,19]. The present study gives information about internal consistency and construct validity but does not provide information about other psychometric properties such as sensitivity to change over time and test-retest reliability.
Additionally, other instruments may be suitable for this study population. Based on our results, further psychometric testing of the SF-36 and GARS in this population is recommended.

Conclusion
The SF-36 and the GARS appear to be preferable for use in polytrauma patients over the SIP-136.