Hundreds of HRQOL and other PRO instruments are available.41–43Table 2–1 provides a taxonomy of the different types of instruments.44 A primary distinction among HRQOL instruments is whether they are generic or specific.
TABLE 2-1 Taxonomy of Health-Related Quality of Life Instruments |Favorite Table|Download (.pdf)
TABLE 2-1 Taxonomy of Health-Related Quality of Life Instruments
|Disease specific (e.g., diabetes)|
|Population specific (e.g., frail older adults)|
|Function specific (e.g., sexual functioning)|
|Condition or problem specific (e.g., pain)|
Generic, or general, HRQOL instruments are designed to be applicable across all diseases or conditions, across different medical interventions, and across a wide variety of populations.45Table 2–2 lists the dimensions or domains of five generic instruments. Although no longer commonly used, the Nottingham Health Profile (NIP)47 and Sickness Impact Profile (SIP)49 are included because of their historical significance to the field of health status and HRQOL assessment. In choosing or evaluating the use of an instrument, the specific dimensions of functioning and well-being covered must be considered. The instruments in Table 2–2 have common dimensions, but they also reflect the diversity and range of dimensions covered. The two main types of generic instruments are health profiles and preference-based measures.
TABLE 2-2 Domains Included in Selected Generic Instruments |Favorite Table|Download (.pdf)
TABLE 2-2 Domains Included in Selected Generic Instruments
|Nottingham Health Profile (NHP)47|
|Part I: Distress within the following domains|
|Part II: Health-related problems within the following domains|
|Quality of Well-Being Scale (QWB)48|
|Sickness Impact Profile (SIP)49|
|Sleep and rest||Home management|
|Eating||Recreation and pastimes|
|Work||Body care and movement|
|Health Utilities Index (HUI)—Mark III50|
|Speech||Pain and discomfort|
Health profiles provide an array of scores representing individual dimensions or domains of HRQOL or health status. An advantage of a health profile is that it provides multiple outcome scores that may be useful to clinicians and/or researchers attempting to measure differential effects of a condition or its treatment on various QOL domains.
A commonly used profile instrument is the Medical Outcomes Study 36-Item Short Form Health Survey (SF-36).51 The instrument includes nine health concepts or scales (Table 2–3). The SF-36 has several advantages. For example, it is brief (it takes approximately 5-10 minutes to complete), and its reliability and validity have been documented in many clinical situations and disease states.52,53 The SF-36 has provided a means of aggregating the items into physical (PCS) and mental (MCS) component summary scores.54 In addition, an abbreviated version of the SF-36, containing only 12 items (SF-12), is available.55 However, the scale scores and the MCS and PCS scores derived from the SF-12 are based on fewer items and fewer defined levels of health and, as a result, are estimated with less precision and less reliability. The loss of precision and reliability in measurement can be a problem in small samples and/or with small expected effect sizes for an intervention. An example of the use of SF-36 data in the labeling (i.e., prescribing information) for a biopharmaceutical product (thyrotropin alfa for injection) is available at http://www.thyrogen.com/pdfs/pi.pdf. A more recent version of the SF-36 (i.e., SF-36v2) is available that retains the same scale structure.56 For further information on the SF-36 and the other health status measures and versions derived from it, visit www.SF36.org.
TABLE 2-3 SF-36 Scales and Number of Items per Scale (SF-36/SF-12) |Favorite Table|Download (.pdf)
TABLE 2-3 SF-36 Scales and Number of Items per Scale (SF-36/SF-12)
|Physical functioning (10/2)|
|Role limitations attributed to physical problems (4/2)|
|Bodily pain (2/1)|
|General health (5/1)|
|Social functioning (2/1)|
|Role limitations attributed to emotional problems (3/2)|
|Mental health (5/2)|
|Health transition (1/0)|
In discussing health profiles, it is important to point out that individual dimensions or scales of multidimensional HRQOL measures (e.g., physical health, social functioning) do not themselves represent HRQOL; they are components or elements that when combined, embody the concept/construct of HRQOL. Improvements in some but not all of the dimensions do not allow one to say that HRQOL improved; all that can be said is that the specific dimensions improved.
HRQOL as assessed by preference-based measures is a single overall index score on a scale anchored by 1.0 (full health) and 0.0 (dead). Health states considered worse than dead can be reflected by negative numbers on the scale. This approach combines the measurement of an individual's health status with an adjustment for the relative desirability of, or preference for, that health state. The preferences are measured or assigned empirically through a variety of procedures. Although often called health state utilities, the term preferences is used in this chapter as the broader term because it subsumes both utilities and values.57
Preference-based measures are useful in health services and pharmacoeconomic research, specifically cost-utility analysis (CUA).58 CUA, an economic technique discussed in Chapter 1, involves comparing the incremental costs of an intervention (e.g., a medication) with its incremental outcomes expressed in units such as quality-adjusted life years (QALYs) gained. QALYs gained is an outcome measure that incorporates both quantity and quality of life. This can be a key outcome measure, especially in diseases such as cancer, where the treatment itself can have a major impact on patient functioning and well-being. Numerous published studies have used CUA to evaluate the economic efficiency of healthcare interventions, including pharmaceuticals and medical devices. A review of CUAs published from 1976 to 2006 by Neumann et al.59 found that the number of CUAs has increased markedly over that time and that the quality of studies is improving. CUA data compiled during this extensive review is available at www.cearegistry.org.
QALYs can be produced by increases in quality and/or length of life. Figure 2–1 represents a case in which QALYs were gained through an increase in HRQOL alone. The top curve represents the hypothetical life course of a cohort of individuals receiving a specific healthcare intervention compared with the life course of a cohort (i.e., lower curve) that did not receive the intervention. Average age at death did not differ between the two cohorts, but the intervention led to improvements in HRQOL in the treatment cohort. The area between the curves represents the QALYs gained through the intervention. This hypothetical case reflects a chronic disease, such as osteoarthritis, in which functioning and well-being are increased but survival remains unchanged. Other hypothetical combinations of quality and quantity of life can be graphed in this manner. For example, an alternative scenario could reflect a temporary decrease in HRQOL but an increase in survival that may result from a chemotherapeutic regimen for cancer.
QALYs gained (i.e., area between the curves) as the outcome of a hypothetical healthcare intervention, such as a drug. (QALY, quality-adjusted life-year.)
Sidebar: HRQOL Controversy
Although the QALY is the most commonly used health outcome summary measure, it is not the only one. Other conceptually equivalent outcomes include years of healthy life (YHL), well years (WYs), health-adjusted person years (HAPYs), and health-adjusted life expectancy (HALE). An alternative concept called healthy year equivalents (HYEs) has been proposed as theoretically superior to QALYs, but its practical significance has been limited.
Direct Measures of Health State Preferences
The most commonly used direct measurement techniques include visual analog scales, standard gamble, and time trade-off.57
The visual analog scale (VAS) is a line, typically 10 to 20 cm in length, with the end points well-defined (e.g., 0 = worst imaginable health state and 100 = best imaginable health state). The respondent is asked to mark the line where he or she would place a real or hypothetical health state in relation to the two end points. In addition, because death may not always be considered the worst possible health state, the subject's placement of death on the scale in relation to the other health states must be explicitly elicited. If a subject has placed death at 0 and rates a health state at the midpoint between 0 and 100 on the scale, that subject's preference for that health state is 0.5.
The standard gamble offers a choice between two alternatives: choice A, living in health state i with certainty, or choice B, taking a gamble on a new treatment for which the outcome is uncertain. Figure 2–2 shows this gamble.57 The subject is told that a hypothetical treatment will lead to perfect health, for a defined remaining lifetime, with a probability of P or immediate death with a probability of 1 – P. The subject can choose between remaining, for the same defined lifetime, in state i, which is intermediate between healthy and dead, or taking the gamble and trying the new treatment. The probability P is varied until the subject is indifferent between choices A and B. For example, if a subject is indifferent between the choices A and B when P = 0.75, the preference (i.e., utility) of state i is 0.75.
Standard gamble for a chronic health status. The subject is offered the choice between A and B. A involves the certainty of living in health state i (a suboptimal health state) for a specified period of time. B involves an intervention that could lead to full health for the same period of time or immediate death. The probabilities associated with the outcomes of healthy and dead are P and 1 – P, respectively. As P is varied, the indifference point between choices A and B represents the utility of state i.
Figure 2–3 represents the time trade-off (TTO) technique for a chronic disease state.57 Here, the subject is offered a choice of living for a variable amount of time x in perfect health or a defined amount of time t in a health state i that is less desirable. By reducing the time x of being healthy (at 1.0) and leaving the time t in the suboptimal health state fixed, an indifference point can be determined (hi = x/t). For example, a subject may indicate that undergoing chronic hemodialysis for 2 years is equivalent to perfect health for 1 year. Therefore, the value of that health state would be 0.5 (hi = 1/2).
Time trade-off for a chronic health state. The subject chooses between living a varying amount of time in full health (x) and living a specified amount of time (t) in state i. The length of time in full health is shortened until the subject is indifferent between the two choices. The value of health state i(hi) then is calculated by dividing x/t.
Sidebar: HRQOL Controversy
There is considerable debate regarding the best approach to the direct measurement, or elicitation of health state preferences. The empirical literature consistently shows that there are differences in the preferences derived through the different elicitation methods. Although there have been calls for the development of standardized preference elicitation protocols, the lack of consensus likely will continue into the foreseeable future.
Multiattribute Health Status Classification Systems
In addition to direct measures, instruments are available for which the health state preferences have been derived empirically through population studies. The instruments are administered to assess respondents' health status or health state, which then is mapped onto a multiattribute health status classification system. Examples of such instruments include the Quality of Well-Being Scale (QWB),48 the Health Utilities Index (HUI),50 the EuroQoL Group's EQ-5D,46 and the SF-6D.60 Although each is described briefly below, more thorough descriptions of these four instruments are provided elsewhere.43,61
The QWB is a generic HRQOL instrument that includes symptoms or problems plus three dimensions of functional health status (see Table 2–2). Standardized preference values for the health states represented by the QWB have been measured (via the category rating scale method, a technique related to the VAS) and validated on a general population sample.48 The QWB was available originally only as an interviewer-administered version, but a self-administered version now is available.62
The HUI is another generic instrument that describes the health status of a person at a point in time in terms of his or her ability to function on a set of attributes or dimensions of health status. The HUI Mark II/III is available as a 15-item self-administered form. The measurements for the development of the health state preference system were made with VASs and the standard gamble technique. The dimensions covered in the most recent version of the HUI (Mark III) are listed in Table 2–2.50
The EQ-5D was designed to be self-administered and short enough to be used in conjunction with other measures.46 The first of two parts classifies subjects into one of 243 health states within five dimensions. Sets of TTO-based preference weights derived from the general U.S. adult population are available for the 243 EQ-5D health states.63,64 The second part of the EQ-5D is a 20-cm VAS that has end points labeled “best imaginable health state” and “worst imaginable health state” anchored at 100 and 0, respectively. Respondents are asked to indicate how they rate their own health state by drawing a line from an anchor box to that point on the VAS that best represents their own health on that day.
With the dominance of the SF-36 among the profile measures, there was significant interest in deriving a health index score from it to enable its incorporation into economic evaluation involving QALYs. In order to address this limitation, Brazier et al.65 undertook the development of a preference-based index that used health state classifications derived from the SF-36 items. The resulting multiattribute health status classification system is called the SF-6D.60 The current version of the SF-6D is based on 11 SF-36 items. With four to six levels for each of six dimensions, it defines 18,000 possible health states. A UK general population study was conducted to elicit preferences for a sample of the SF-6D health states using a standard gamble technique. Then, a model was constructed for estimating mean preferences for all possible SF-6D health states.
Sidebar: HRQOL Controversy
Whose preferences should be used in the calculation of QALYs for CUA? Some authors have argued that health state preferences elicited from the general population should not be applied to specific patient groups. However, when public resource allocation decisions are being made, general population preferences may be the most appropriate.
Specific or targeted instruments are intended to provide greater detail concerning particular outcomes, in terms of functioning and well-being, uniquely associated with a condition and/or its treatment. Several selected examples of disease-specific instruments are listed in Table 2–4; however, hundreds of other targeted instruments are available.41–43 One of the instruments listed is the Asthma Quality-of-Life Questionnaire (AQLQ), a 32-item instrument developed to assess the impact of asthma on patients' everyday functioning and well-being.67 Results from research in which the AQLQ was used have appeared in promotional materials for the salmeterol inhaler (GlaxoSmithKline). As opposed to prior prescription drug advertisements that involved predominantly physiologic-based claims,29 this was one of the first times a pharmaceutical firm promoted a product based on data from trials involving HRQOL as a primary outcome measure. However, based on the recent release of the FDA's guidance for industry titled Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims, the regulatory hurdle for HRQOL claims has been raised.14 Nevertheless, Leidy et al.72 have provided useful recommendations for evaluating the validity of HRQOL claims for labeling and promotion of pharmaceuticals.
TABLE 2-4 Selected Disease-Specific Quality-of-Life Instruments |Favorite Table|Download (.pdf)
TABLE 2-4 Selected Disease-Specific Quality-of-Life Instruments
|Arthritis Impact Measurement Scales (AIMS)66|
|Asthma Quality-of-Life Questionnaire (AQLQ)67|
|Functional Assessment of Cancer Therapy-Colorectal (FACT-C) Scale68|
|Kidney Disease Quality-of-Life (KDQOL) Instrument69|
|Quality of Life in Epilepsy (QOLIE)70|
|Medical Outcomes Study HIV Health Survey (MOS-HIV)71|
Disease- or condition-specific instruments can, although not always, be more sensitive than a generic measure to particular changes in HRQOL secondary to the disease or its treatment. In addition, specific measures may appear to be more clinically relevant to patients and healthcare providers.44 However, a concern regarding the use of only specific instruments is that by focusing on the specific impact, the general or overall impact on functioning and well-being may be overlooked. In studies involving pharmacotherapy, the use of both a generic and a specific instrument may be the best approach. The generic instrument provides a more general outcome assessment and allows comparability across other disease states or conditions in which it has been used. An appropriately selected specific instrument should provide more detailed outcome information regarding expected changes in the particular patient population.
A number of issues must be considered when evaluating existing HRQOL research and/or choosing the appropriate instrument to use when designing a study involving HRQOL assessment. A thorough review of these issues is not within the scope of this chapter; more in-depth reviews of methodologic considerations are available in the literature.22,73,74 Of particular concern are the psychometric properties of a chosen instrument. Psychometrics refers to the measurement of psychological constructs, such as HRQOL. Instruments should be developed and tested such that one can place confidence in the measurement made. Psychometric properties of measures (e.g., reliability and validity) are considered in the review criteria developed by the Scientific Advisory Committee of the Medical Outcomes Trust (MOT).75 The MOT is a depository and distributor of standardized health outcomes measurement instruments. Every instrument that is proposed for addition to the MOT list of approved instruments is reviewed against a rigorous set of eight attributes. These attributes provide a useful evaluative framework. The eight attributes of an instrument addressed by the review criteria are as follows: (a) conceptual and measurement model, (b) reliability, (c) validity, (d) responsiveness, (e) interpretability, (f) respondent and administrative burden, (g) alternate forms, and (h) cultural and language adaptations.
Conceptual and Measurement Models
A conceptual model is the rationale for and description of the concepts that a measurement instrument is intended to assess and the interrelationships of those concepts. A measurement model is an instrument's scale and subscale structure and the procedures followed to create scale and subscale scores. An example is the well-defined conceptual and measurement models for the scales and scale structure of the SF-36.51 The SF-36 contains 36 items that cover nine theory-based health concepts. Eight of these health concepts are measured by multi-item scales. There is a clearly defined means of creating the individual scale scores and the PCS and MCS scales.54
Reliability refers to the extent to which measures give consistent or accurate results. The purpose of evaluating the reliability of a HRQOL instrument is to estimate how much of the variation in a score is real as opposed to random. The two reliability assessment methods discussed most often in the HRQOL literature are internal consistency and test–retest reliability. Internal consistency is an assessment of the performance of items within a scale. It is a function of the number of items and their covariation.76 Internal consistency is commonly measured using the Cronbach's α-coefficient. α-Coefficients >0.90 are recommended for making comparisons between individuals and >0.70 for comparisons between groups.77
Test–retest reliability refers to the relationship between scores obtained from the same instrument on two or more separate occasions when all pertinent conditions remain relatively unchanged. It is usually evaluated using the intraclass correlation coefficient (ICC).74 However, HRQOL is not assumed to be constant over the course of time. In fact, most clinical studies attempt to assess how HRQOL changes. Nevertheless, test–retest reliability is an important measurement property that is often assessed with a retest interval of 2 to 14 days.78
Interrater reliability and equivalent-forms reliability are two other approaches to reliability assessment that are not used as commonly in HRQOL research. More in-depth discussions of these and the other reliability assessment methods are found elsewhere.74,78
Reliability is necessary but not sufficient for valid measurement.76Validity is an estimation of the extent to which the instrument is measuring what it is supposed to be measuring. Validity is not an absolute property of an instrument. Hence, a measurement instrument is not “valid,” but empirical data can provide evidence to support its validity. Three types of validity commonly considered are criterion, content, and construct.
Content validity, which is infrequently tested statistically, refers to how adequately the questions/items capture the relevant aspects of the domain or concept being measured.
Criterion validity is demonstrated when a new measure corresponds to an established measure or observation that accurately reflects the phenomenon of interest. According to Streiner and Norman,78 criterion validation can be divided into two types: concurrent validation and predictive validation. With concurrent validation, a new scale is administered alongside an accepted measure in the field (i.e., criterion). For example, a new measure of emotional well-being could be established by showing a strong association (correlation) between it and the Beck Depression Inventory.79 For predictive validation, the criterion is not available immediately. For instance, if the developer of a new health status measure purports that it will be predictive of 1-year mortality, then the evidence for criterion validation is only available 1 year from the time of administration.
Construct validity refers to the relationship between measures purporting to measure the same underlying theoretical construct (convergent evidence) or purporting to measure different constructs (discriminant evidence).78 Although evidence for the construct validity of a measure might be established through comparisons with physiologic measures or organ pathology, it is more often obtained by comparing extreme groups that have been categorized by another means of assessment. For example, if a new PRO instrument is intended to measure impairment of physical function resulting from osteoarthritis, then the instrument's scores should correlate with a clinician's assessment of disease severity (e.g., none, mild, moderate, severe). Those patients judged by the clinician to have severe osteoarthritis would be expected to have scores on the new PRO instrument that reflect poorer physical function than those judged to have mild osteoarthritis.
Responsiveness, or sensitivity to change, is the ability or power of the measure to detect clinically important change when it occurs.80 Although some authors have suggested that responsiveness is a psychometric property of a measure distinct from validity,81 others argue that responsiveness is an aspect of validity rather than a separate property.76,82
Sidebar: HRQOL Controversy
What constitutes a minimally important difference on an HRQOL measure? Although the statistical significance of a change or difference score is often used to denote important change, it may overestimate or underestimate the true impact of the disease and/or its treatment in terms of change that is perceptible and important to patients. Discussions regarding the concept of minimally important difference are increasingly appearing in the literature.
Interpretability is the degree to which one can assign qualitative meaning to an instrument's quantitative scores. Interpretability is facilitated by comparison of a score or change in scores to a qualitative category that has clinical or commonly understood meaning. For example, it would be helpful to know how scale scores obtained in a specific patient sample compare with the scale scores of the general population. Ware et al. have provided very useful U.S. population-based normative data for the SF-36.83 U.S. population norms are also available for the EQ-5D and the HUI.84
Respondent and Administrative Burden
Respondent burden refers to the time, energy, and other demands placed on those to whom the instrument is administered. Administrative burden refers to the demands placed on those who administer the instrument. A practical aspect of the measurement of HRQOL is length of the instrument or the administration time involved. Instruments should be as brief as possible without severely compromising the validity and reliability of the measurement. The longer an instrument, the greater is the respondent burden. This can lead to an individual's unwillingness or refusal to complete the instrument or to incomplete responses.
Alternate forms of an instrument include all modes of administration other than the original source instrument. Evidence should be provided that supports the comparability of the alternate mode of administration with that of the original instrument.85 Many PRO measures can be administered in different ways. The primary modes of administration have traditionally been (a) self-administered or (b) interviewer-administered questionnaires, either in person or over the telephone.44 However, electronic (ePRO) modes of self-administration (e.g., PDAs and mobile devices with touch screens, interactive voice response systems, Web-based questionnaires) are becoming increasingly important.86 Used but not recommended are proxy responders (i.e., using a healthcare provider, family member, or friend to respond for the subject when the subject is unable to complete the instrument). Because HRQOL is such a subjective concept, patients must have the opportunity to provide their perspective on the impact of illness and/or medical care on their functioning and well-being. The patient's perspective has been shown to be quite different from that of outside observers, including physicians, family members, or others close to the patient.87
Cultural and Language Adaptations
Methods used to achieve conceptual and linguistic equivalence of cross-culturally adapted instruments should be explicitly stated.88 Evidence should be provided that the measurement properties of the adaptation are comparable with those of the original instrument. This is an extremely important issue when planning cross-national QOL assessment projects. However, it also is very important within countries that are multicultural, such as the United States.89 Many of the English-language instruments have been developed for the dominant U.S. culture and may not be appropriate for all patients.
Selection of an Appropriate Instrument
It is essential that the purpose of the measurement be well-defined before selection of a PRO instrument. Is the purpose of the measurement to describe the symptom burden, health status, or HRQOL of a patient population at a particular time or over time?90 Is it to document change in health outcomes associated with a particular intervention? Is it to monitor the effects of disease and its treatment for individual patients in routine clinical practice? These and other questions should be answered before PRO instruments are selected. Too many practitioner-researchers attempting to demonstrate improvements in outcomes resulting from a pharmaceutical product or service select a commonly used generic instrument, such as the SF-36, with the expectation that it will be sufficiently responsive to changes that may occur. The best approach may be to use a generic instrument in conjunction with a more targeted, disease-specific instrument.
Availability of Instruments
Many PRO instruments are publicly available. Although they can be used for no or little cost, a fee may be associated with the purchase of a user's guide or scoring manual. The MOT (www.outcomes-trust.org) provides links to a number of instruments, including the Duke Health Profile, QWB, MOS-HIV Health Survey, Migraine-Specific Quality of Life (MSQOL), and SIP. For information on availability of the SF-36 and SF-12, go to www.sf36.org. The FACIT (Functional Assessment of Chronic Illness Therapy) Web site (www.facit.org) has an extensive array of cancer- and chronic disease-targeted instruments that can be licensed for use. Developers of particular instruments often can be contacted through addresses provided in other books referenced at the end of this chapter.41–43