Reliability: Defined as the extent to which the instrument yields the same results on repeated measures.
Interrater: is intended to demonstrate the equivalence or agreement among raters who are collecting data. Interrater reliability is appropriate when the subjects in a report are being observed or rated by a researcher or assistant. It is not appropriate when the research subjects are rating their own behavior, perceptions, opinions, or attitudes.
Internal consistency: is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the observers or of the measuring instruments used in a study. This type of reliability often helps researchers interpret data and predict the value of scores and the limits of the relationship among variables.
Test-retest: is intended to demonstrate the stability or consistency of a measure, it is inappropriate for use as an instrument intended to measure a non-stable construct. To determine stability, a measure or test is repeated on the same subjects at a future date. Results are compared and correlated with the initial test to give a measure of stability.
Equivalency reliability: is the extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association. In quantitative studies and particularly in experimental studies, a correlation coefficient, statistically referred to as r, is used to show the strength of the correlation between a dependent variable (the subject under study), and one or more independent variables, which are manipulated to determine effects on the dependent variable. An important consideration is that equivalency reliability is concerned with co relational, not causal, relationships.
Validity: Defined as the accuracy with which a measurement tool measures the concept it is intended to measure. Researchers should be concerned with both external and internal validity. External validity refers to the extent to which the results of a study are generalizable. Internal validity refers to (1) the rigor with which the study was conducted (e.g., the study's design, the care taken to conduct measurements, and decisions concerning what was and wasn't measured) and (2) the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore. In studies that do not explore causal relationships, only the first of these definitions should be considered when assessing internal validity.
Content: is based on the extent to which a measurement reflects the specific intended domain of content. For example, a researcher needing to measure an attitude like self-esteem must decide what constitutes a relevant domain of content for that attitude. For socio-cultural studies, content validity forces the researchers to define the very domains they are attempting to study.
Face: is concerned with how a measure or procedure appears. Does it seem like a reasonable way to gain the information the researchers are attempting to obtain? Does it seem well designed? Does it seem as though it will work reliably? Unlike content validity, face validity does not depend on established theories for support
Criterion-related: is used to demonstrate the accuracy of a measure or procedure by comparing it with another measure or procedure which has been demonstrated to be valid.
For example, imagine a hands-on driving test has been shown to be an accurate test of driving skills. By comparing the scores on the written driving test with the scores from the hands-on driving test, the written test can be validated by using a criterion related strategy in which the hands-on driving test is compared to the written test.
Construct: seeks agreement between a theoretical concept and a specific measuring device or procedure. For example, a researcher inventing a new IQ test might spend a great deal of time attempting to "define" intelligence in order to reach an acceptable level of construct validity.
Construct validity can be broken down into two sub-categories, convergent validity and discriminate validity.
- Convergent validity is the actual general agreement among ratings, gathered independently of one another, where measures should be theoretically related.
- Discriminate validity is the lack of a relationship among measures which theoretically should not be related.
To understand whether a piece of research has construct validity, three steps should be followed.
- First, the theoretical relationships must be specified.
- Second, the empirical relationships between the measures of the concepts must be examined.
- Third, the empirical evidence must be interpreted in terms of how it clarifies the construct validity of the particular measure being tested.
There are four major approaches to testing construct validity hypothesis testing, convergent and divergent, contrasted groups, and factor analytical.
Responsiveness/sensitivity to change: is used to detect meaningful change over time, classically described as responsiveness. It involves two issues:
- First, the measure must detect meaningful change when it has occurred
- Second, it must remain stable when no change has occurred. The ability of a scale to detect change when it has occurred describes the scale's sensitivity to change, whereas the stability of a scale in patients who have not changed represents its specificity to change.
In order to investigate the sensitivity of a diagnostic test, the investigator looks only at the patients diagnosed with the disease based on a gold standard and calculates the percentage of patients properly diagnosed by the test.
- Specificity is calculated by looking only at patients without the disease and calculating the percentage with a negative test result.
- Sensitivity and specificity may then be combined into a statistic such as a positive likelihood ratio (sensitivity/[1–specificity]) that combines both patients with and without the disease and describes the overall diagnostic ability of the test.
Reliability and Validity Links
Introduction to Validity: William Trochim's introduction to validity in his comprehensive online textbook about research methods and issues.
http://www.socialresearchmethods.net/kb/measval.php
Reliability: Trochim's overview of reliability.
http://www.socialresearchmethods.net/kb/reliable.php
External Validity: William Trochim's discussion of external reliability.
http://www.socialresearchmethods.net/kb/external.php
Educational Psychology Interactive: Internal and External Validity: A Web document addressing key issues of external and internal validity.
http://www.edpsycinteractive.org/topics/intro/valdgn.html
Field Methods (formerly Cultural Anthropology Methods Journal): An online journal containing articles on the practical application of research methods when conducting qualitative and quantitative research. Reliability and validity are addressed throughout.
http://fmx.sagepub.com
Internal Validity Tutorial: An interactive tutorial on internal validity.
http://server.bmod.athabascau.ca/html/Validity/index.shtml




