Outcomes Instrumentation for Clinician Researchers


This content was created by the ARHP Research subcommittee to assist researchers, clinicians, and other interested parties to learn about valid and reliable patient-oriented outcome instruments that are useful in the study and management of arthritis and other rheumatic conditions.

One important group of instruments, measure patient quality of life (QOL) which is a complex, abstract, and multidimensional concept that is often difficult to define and measure. Another group of instruments useful in the study of arthritis management are those that measure functional status. Because various conceptual and operational definitions have been used in studies to document the effects of health care interventions on patient outcomes in arthritis, we have attempted to explain some of the key measurement issues that relate to the instruments. These explanations are brief, but should serve as a beginning introduction to key measurement issues.

The following criteria were developed for inclusion of measures in the volume of Arthritis Care & Research:

  1. Measures should be relevant to and/or used in rheumatology research and/or clinical practice
  2. Measures should be in the public domain (i.e., do not require purchase)
  3. Measures should require very little or easily attainable equipment to administer
  4. Measures should not require special training or certification to administer
  5. Measures should not be not biologically based (e.g., radiographic grading systems)

The reader is also referred to a newer Special Issue of Arthritis Care & Research that reviewed Patient Outcomes in Rheumatology in 2011 (Volume 63, No S11). This issue contains 35 reviews covering over 250 measures in four primary domains: Pathology and Symptoms, Function, Health Status and Quality of Life, and Psychological. The Committee on ARHP Research has additionally developed this website to serve as an additional source for summaries of some of the most commonly used patient outcome measures in rheumatology.


Inclusion of an instrument in this website should not be interpreted as an endorsement of the instrument by the ACR/ARHP. Inclusion criteria which we initially used to select instruments for this website began with those instruments listed in Arthritis Care and Research Volume 49, Issue S5, 2003.

Key Concepts

Health: defined by the World Health Organization as "not merely the absence of disease or infirmity"3but as a concept that incorporates well-being or wellness in all areas of life (physical, mental, emotional, social, spiritual). Health, according to this definition, is a broad concept incorporating disease, illness, and wellness. When considered as a dimension of quality of life, health is best thought to fall under the purview of health care providers in order to provide a health care intervention.

Health Status: an individual's relative level of wellness and illness, taking into account the presence of biological or physiological dysfunction, symptoms, and functional impairment.

Health Perceptions/Perceived Health Status: subjective ratings by the affected individual of his or her health status. Some people perceive themselves as healthy despite suffering from arthritis, while others perceive themselves as ill when no objective evidence of disease can be found.

Quality of life: an individual's satisfaction or happiness with life in domains he or she considers important1. Also known as "life satisfaction" or "subjective well-being," it is now sometimes referred to as "overall quality of life" or "global quality of life" to distinguish it from "health-related quality of life." It is the broadest of all concepts influenced by all of the dimensions of life that contribute to its richness and reward, pleasure and pain. These dimensions include, but are not limited to, health. A person's assessment of satisfaction with life involves two subjective considerations:

  1. How important a given domain is for that person
  2. How satisfied one is with that domain.

For instance a person can be unsatisfied with a domain that one considers to be of relatively little importance, and report a satisfactory overall quality of life. However, dissatisfaction with a domain of great importance to an individual, would clearly contribute to lower overall life quality.

Numerous taxonomies of life domains have been proposed by social, psychological, gerontological, and health sciences researchers based on studies of general populations of both well and ill people. A typical taxonomy is that of Flanagan2, which categorizes 15 dimensions of life quality into five domains, as shown below in the table.

Table: Flanagan’s Dimensions of Quality of Life


Quality of Life Dimensions

Physicalandmaterial well-being

Material well-being and financial security
Health and personal safety

Relations with other people

Relations with spouse
Having and rearing children
Relations with parents, siblings, or other relatives
Relations with friends

Social, community, civic activities

Helping and encouraging others
Participating in local and governmental affairs

Personal development, fulfillment

Intellectual development
Understanding and planning
Occupational role career
Creativity and personal expression


Socializing with others
Passive and observational recreational activities
Participating in active recreation

Functional status: an individual's ability to perform normal daily activities required to meet basic needs, fulfill usual roles, and maintain health and well-being5,6. Functional status includes functional capacity and functional performance. Functional status can be influenced by biological or physiological impairment, symptoms, mood, and other factors.6 It is also likely to be influenced by health perceptions. For example, a person whom most would judge to be well but who views him/herself as ill may have a low level of functional performance in relation to his capacity.5

Functional capacity: represents an individual's capacity to perform daily activities in the physical, psychological, social, and spiritual domains of life. Example - A maximal exercise test measures physical functional capacity.

Functional performance: refers to the activities people actually do during the course of their daily lives.5 Example - A self-report of activities of daily living measures functional performance.

Mood: refers to emotional responses to stressors such as changes in health state. These emotional reactions to life experiences are usually reflected in an individual's affect: the face one presents to the world.

  1. Mood describes a sustained emotional response that, when persistent, can color a person's view of the world.
  2. Depression, anxiety, and anger are emotions that sometimes coexist with physical illness, and may affect the individual's functional performance, symptom and health perceptions, and quality of life.6,12,13 Conversely, decreased functional status may contribute to depressed mood in people with chronic lung disease.12

Symptoms: are patients' perceptions of "an abnormal physical, emotional, or cognitive state"6.

References for Key Concepts Section

  1. Oleson M. subjectively perceived quality of life. Image 1990; 22:187-190.
  2. Flanagan JC. A research approach to improving our quality of life. Am Psychol 1978; 33; 138-147
  3. World Health Organization. Constitution of the World Health Organization: Chronicle of the World Health Organization 1. Geneva: WHO, 1947.
  4. Ware JE. The status of health assessment 1994. A Rev Pub Health 1995; 16:327-354.
  5. Leidy NK. Functional status and the forward progress of merry-go-rounds: Toward a coherent analytical framework. Nurs Res 1994; 43:196-202.
  6. Wilson IB, Cleary PD. Linking clinical variables with health-related quality of life. JAMA 1995; 1995:59-65.
  7. Kinsman RA, Yaroush RA, Fernandez E, Dirks JF, Schocket M, Fukuhara J. Symptoms and experiences in chronic bronchitis and emphysema. Chest 1983; 83:755-761.
  8. McSweeny AJ. Quality of life in relation to COPD. In McSweeny AJ, Grant I, eds. Chronic Obstructive Pulmonary Disease: A Behavioral Perspective. New York: Marcel Dekker, 1988.
  9. Dudley DL, Glaser EM, Jorgenson BN, Logan DL. Psychosocial concomitants to rehabilitation in chronic obstructive pulmonary disease: Part 1. Psychosocial and psychological considerations. Chest 1980; 77:413-420.
  10. Light RW, Merrill EJ, Despars JA, Gordon GH, Mutalipassi LR. Prevalence of depression and anxiety in patients with COPD: Relationship to functional capacity. Chest 1985; 87:35-38.
  11. Curtis JR. Assessing health-related quality of life in chronic pulmonary disease. In Fishman AP, ed. Pulmonary Rehabilitation. New York: Marcel Dekker, 1996.
  12. Anderson KL. The effect of chronic obstructive pulmonary disease on quality of life. Res Nurs Health 1995; 18:547-556.
  13. Moody L, McCormick K, Williams A. Disease and symptom severity, functional status, and quality of life in chronic bronchitis and emphysema. J Behav Med 1990; 13:297-306.
  14. Torrance G, O'Brien B. An interview on utility measurement. J Rheumatol 1995; 22:1200-1202.
  15. Redelmeier DA, Detsky AS. A clinician's guide to utility measurement. Prim Care 1995; 22:271-280.
  16. Patrick DL, Starks HE, Cain KC, Uhlmann RF, Pearlman RA. Measuring preferences for health states worse than death. Med Decis Making 1994; 14:9-18.
  17. Feeny D, Labelle R, Torrance GW. Integrating economic evaluations and quality of life assessments. In Spilker B, ed. Quality of Life Assessments in Clinical Trials. New York: Raven Press, 1990.
  18. Curtis JR, Martin DP, Martin TM. Patient-Assessed Health Outcomes in Chronic Lung Disease: What Are They, How Do They Help Us, and Where Do We Go From Here? American Journal of Respiratory and Critical Care Medicine 1997; 156:1032-1039.

Measurement Terms – Psychometric Information

Reliability: Defined as the extent to which the instrument yields the same results on repeated measures.

Interrater: is intended to demonstrate the equivalence or agreement among raters who are collecting data. Interrater reliability is appropriate when the subjects in a report are being observed or rated by a researcher or assistant. It is not appropriate when the research subjects are rating their own behavior, perceptions, opinions, or attitudes.

Internal consistency: is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the observers or of the measuring instruments used in a study. This type of reliability often helps researchers interpret data and predict the value of scores and the limits of the relationship among variables.

Test-retest: is intended to demonstrate the stability or consistency of a measure, it is inappropriate for use as an instrument intended to measure a non-stable construct. To determine stability, a measure or test is repeated on the same subjects at a future date. Results are compared and correlated with the initial test to give a measure of stability.

Equivalency reliability: is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association. In quantitative studies and particularly in experimental studies, a correlation coefficient, statistically referred to as r, is used to show the strength of the correlation between a dependent variable (the subject under study), and one or more independent variables, which are manipulated to determine effects on the dependent variable. An important consideration is that equivalency reliability is concerned with co rrelational, not causal, relationships.

Validity: Defined as the accuracy with which a measurement tool measures the concept it is intended to measure. Researchers should be concerned with both external and internal validity. External validity refers to the extent to which the results of a study are generalizable. Internal validity refers to (1) the rigor with which the study was conducted (e.g., the study's design, the care taken to conduct measurements, and decisions concerning what was and wasn't measured) and (2) the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore. In studies that do not explore causal relationships, only the first of these definitions should be considered when assessing internal validity.

Content: is based on the extent to which a measurement reflects the specific intended domain of content. For example, a researcher needing to measure an attitude like self-esteem must decide what constitutes a relevant domain of content for that attitude. For socio-cultural studies, content validity forces the researchers to define the very domains they are attempting to study.

Face: is concerned with how a measure or procedure appears. Does it seem like a reasonable way to gain the information the researchers are attempting to obtain? Does it seem well designed? Does it seem as though it will work reliably? Unlike content validity, face validity does not depend on established theories for support

Criterion-related: is used to demonstrate the accuracy of a measure or procedure by comparing it with another measure or procedure which has been demonstrated to be valid.

For example, imagine a hands-on driving test has been shown to be an accurate test of driving skills. By comparing the scores on the written driving test with the scores from the hands-on driving test, the written test can be validated by using a criterion related strategy in which the hands-on driving test is compared to the written test.

Construct: seeks agreement between a theoretical concept and a specific measuring device or procedure. For example, a researcher inventing a new IQ test might spend a great deal of time attempting to "define" intelligence in order to reach an acceptable level of construct validity.

Construct validity can be broken down into two sub-categories, convergent validity and discriminate validity.

  1. Convergent validity is the actual general agreement among ratings, gathered independently of one another, where measures should be theoretically related.
  2. Discriminate validity is the lack of a relationship among measures which theoretically should not be related.

To understand whether a piece of research has construct validity, three steps should be followed.

  1. First, the theoretical relationships must be specified.
  2. Second, the empirical relationships between the measures of the concepts must be examined.
  3. Third, the empirical evidence must be interpreted in terms of how it clarifies the construct validity of the particular measure being tested.

There are four major approaches to testing construct validity hypothesis testing, convergent and divergent, contrasted groups, and factor analytical.

Responsiveness/sensitivity to change: is used to detect meaningful change over time, classically described as responsiveness. It involves two issues:

  1. First, the measure must detect meaningful change when it has occurred
  2. Second, it must remain stable when no change has occurred. The ability of a scale to detect change when it has occurred describes the scale's sensitivity to change, whereas the stability of a scale in patients who have not changed represents its specificity to change.

In order to investigate the sensitivity of a diagnostic test, the investigator looks only at the patients diagnosed with the disease based on a gold standard and calculates the percentage of patients properly diagnosed by the test.

  1. Specificity is calculated by looking only at patients without the disease and calculating the percentage with a negative test result.
  2. Sensitivity and specificity may then be combined into a statistic such as a positive likelihood ratio (sensitivity/) that combines both patients with and without the disease and describes the overall diagnostic ability of the test.

Reliability and Validity

Introduction to Validity: William Trochim's introduction to validity in his comprehensive online textbook about research methods and issues.
Measurement Validity Types

Reliability: Trochim's overview of reliability.
Trochim's Overview of Reliability

External Validity: William Trochim's discussion of external reliability.
Trochim's Discussion of External Validity

Educational Psychology Interactive: Internal and External Validity: A Web document addressing key issues of external and internal validity.
Educational Psychology Interactive: Internal and External Validity (General)

Field Methods (formerly Cultural Anthropology Methods Journal): An online journal containing articles on the practical application of research methods when conducting qualitative and quantitative research. Reliability and validity are addressed throughout.
Field Methods

Internal Validity Tutorial: An interactive tutorial on internal validity.
Internal Validity Tutorial

For questions or comments, contact ARHP@rheumatology.org.

Updated by ARHP Research Committee.