Skip to main content

Eremenco S et al. on behalf of the ISPOR PRO Mixed Modes Task Force. PRO data collection in clinical trials using mixed modes: Report of the ISPOR PRO Mixed Modes Good Research Practices Task Force. Value in Health. 2014; 17(5): 501-16. PMID: 25128043.

ABSTRACT: The objective of this report was to address the use and mixing of data collection modes within and between trials in which patient-reported outcome (PRO) end points are intended to be used to support medical product labeling. The report first addresses the factors that should be considered when selecting a mode or modes of PRO data collection in a clinical trial, which is often when mixing is first considered. Next, a summary of how to “faithfully” migrate instruments is presented followed by a section on qualitative and quantitative study designs used to evaluate measurement equivalence of the new and original modes of data collection. Finally, the report discusses a number of issues that must be taken into account when mixing modes is deemed necessary or unavoidable within or between trials, including considerations of the risk of mixing at different levels within a clinical trial program and mixing between different types of platforms. In the absence of documented evidence of measurement equivalence, it is strongly recommended that a quantitative equivalence study be conducted before mixing modes in a trial to ensure that sufficient equivalence can be demonstrated to have confidence in pooling PRO data collected by the different modes. However, we also strongly discourage the mixing of paper and electronic field-based instruments and suggest that mixing of electronic modes be considered for clinical trials and only after equivalence has been established. If proceeding with mixing modes, it is important to implement data collection carefully in the trial itself in a planned manner at the country level or higher and minimize ad hoc mixing by sites or individual subjects. Finally, when mixing occurs, it must be addressed in the statistical analysis plan for the trial and the ability to pool the data must be evaluated to then evaluate treatment effects with mixed modes data. A successful mixed modes trial requires a “faithful migration,” measurement equivalence established between modes, and carefully planned implementation to minimize the risk of increased measurement error impacting the power of the trial to detect a treatment effect.

Tang ST, McCorkle R. Use of family proxies in quality of life research for cancer patients at the end of life: a literature review. Cancer Invest. 2002; 20(7-8); 1086-104. PMID: 12449742.

ABSTRACT: One of the main goals of end-of-life care is to achieve the best quality of life (QOL) for patients and their families. Quality of life, therefore, represents a significant outcome indicator to evaluate end-of-life care interventions. However, nonresponse bias and nonrandom missing data in QOL research at the end-of-life limits the generalizability and threatens the internal validity of the study findings. The use of family proxy of patients’ QOL has been suggested as a solution. Demonstration of satisfactory levels of agreement between proxies and patients is warranted before family caregivers’ or other proxies’ assessments can be employed when patients cannot provide their own information. Contrary to the conclusion made by Sprangers and Aaronson [The Role of Health Care Providers and Significant Others in Evaluating the Quality of Life of Patients with Chronic Disease: A Review. J. Clin. Epidemiol. 1992, 45, 743-760], it is suggested from this review of literature that terminal cancer patients and their family caregivers agreed at least moderately well on the patients’ QOL. The bias introduced by the use of family informants is generally of a modest magnitude. When discrepancies existed, without exception, family caregivers held a more negative view of patients’ QOL than did patients. When using family proxies, this is important to remember. The degree of agreement between terminal cancer patients’ and their family caregivers’ assessments varies as a function of the dimensions of QOL being measured and the patient’s health status. However, the accuracy of family caregivers’ assessments can be improved by assessing both patients and family caregivers concurrently over time. Several suggestions for future research are provided to better understand the influencing factors of agreement between patients and family assessments and to enhance the quality of statistical analyses on this topic.

Pew Research Center. Older Adults and Technology Use. 2014. From

Research and statistics about older adults and technology use, including general statistics; usage and adoption practices; and attitudes, impacts, and barriers to adoption.

Gwaltney, CJ, Shields, AL, Shiffman, S. Equivalence of electronic and paper-and-pencil administration of patient-reported outcome measures: a meta-analytic review. Value Health. 2008; 11(2):322-333. PMID: 18380645.

ABSTRACT: OBJECTIVES: Patient-reported outcomes (PROs; self-report assessments) are increasingly important in evaluating medical care and treatment efficacy. Electronic administration of PROs via computer is becoming widespread. This article reviews the literature addressing whether computer-administered tests are equivalent to their paper-and-pencil forms. METHODS: Meta-analysis was used to synthesize 65 studies that directly assessed the equivalence of computer versus paper versions of PROs used in clinical trials. A total of 46 unique studies, evaluating 278 scales, provided sufficient detail to allow quantitative analysis. RESULTS: Among 233 direct comparisons, the average mean difference between modes averaged 0.2% of the scale range (e.g., 0.02 points on a 10-point scale), and 93% were within +/-5% of the scale range. Among 207 correlation coefficients between paper and computer instruments (typically intraclass correlation coefficients), the average weighted correlation was 0.90; 94% of correlations were at least 0.75. Because the cross-mode correlation (paper vs. computer) is also a test-retest correlation, with potential variation because of retest, we compared it to the within-mode (paper vs. paper) test-retest correlation. In four comparisons that evaluated both, the average cross-mode paper-to-computer correlation was almost identical to the within-mode correlation for readministration of a paper measure (0.88 vs. 0.91). CONCLUSIONS: Extensive evidence indicates that paper- and computer-administered PROs are equivalent.

Muehlhausen, W et al. Equivalence of electronic and paper administration of patient-reported outcome measures: a systematic review and meta-analysis of studies conducted between 2007 and 2013. Health Qual Life Outcomes. 2015; 13:167. PMID: 26446159.

ABSTRACT: OBJECTIVE: To conduct a systematic review and meta-analysis of the equivalence between electronic and paper administration of patient reported outcome measures (PROMs) in studies conducted subsequent to those included in Gwaltney et al’s 2008 review. METHODS: A systematic literature review of PROM equivalence studies conducted between 2007 and 2013 identified 1,997 records from which 72 studies met pre-defined inclusion/exclusion criteria. PRO data from each study were extracted, in terms of both correlation coefficients (ICCs, Spearman and Pearson correlations, Kappa statistics) and mean differences (standardized by the standard deviation, SD, and the response scale range). Pooled estimates of correlation and mean difference were estimated. The modifying effects of mode of administration, year of publication, study design, time interval between administrations, mean age of participants and publication type were examined. RESULTS: Four hundred thirty-five individual correlations were extracted, these correlations being highly variable (I2 = 93.8) but showing generally good equivalence, with ICCs ranging from 0.65 to 0.99 and the pooled correlation coefficient being 0.88 (95% CI 0.87 to 0.88). Standardised mean differences for 307 studies were small and less variable (I2 = 33.5) with a pooled standardised mean difference of 0.037 (95% CI 0.031 to 0.042). Average administration mode/platform-specific correlations from 56 studies (61 estimates) had a pooled estimate of 0.88 (95% CI 0.86 to 0.90) and were still highly variable (I2 = 92.1). Similarly, average platform-specific ICCs from 39 studies (42 estimates) had a pooled estimate of 0.90 (95% CI 0.88 to 0.92) with an I2 of 91.5. After excluding 20 studies with outlying correlation coefficients (≥3SD from the mean), the I2 was 54.4, with the equivalence still high, the overall pooled correlation coefficient being 0.88 (95% CI 0.87 to 0.88). Agreement was found to be greater in more recent studies (p

Bennett, AV et al. Mode equivalence and acceptability of tablet computer-, interactive voice response system-, and paper-based administration of the U.S. National Cancer Institute’s Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE). Health Qual Life Outcomes. 2016; 14(1):24. PMID: 26892667.

ABSTRACT: BACKGROUND: PRO-CTCAE is a library of items that measure cancer treatment-related symptomatic adverse events (NCI Contracts: HHSN261201000043C and HHSN 261201000063C). The objective of this study is to examine the equivalence and acceptability of the three data collection modes (Web-enabled touchscreen tablet computer, Interactive voice response system [IVRS], and paper) available within the US National Cancer Institute (NCI) Patient-Reported Outcomes version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) measurement system. METHODS: Participants (n = 112; median age 56.5; 24 % high school or less) receiving treatment for cancer at seven US sites completed 28 PRO-CTCAE items (scoring range 0-4) by three modes (order randomized) at a single study visit. Subjects completed one page (approx. 15 items) of the EORTC QLQ-C30 between each mode as a distractor. Item scores by mode were compared using intraclass correlation coefficients (ICC); differences in scores within the 3-mode crossover design were evaluated with mixed-effects models. Difficulties with each mode experienced by participants were also assessed. RESULTS: 103 (92 %) completed questionnaires by all three modes. The median ICC comparing tablet vs IVRS was 0.78 (range 0.55-0.90); tablet vs paper: 0.81 (0.62-0.96); IVRS vs paper: 0.78 (0.60-0.91); 89 % of ICCs were ≥0.70. Item-level mean differences by mode were small (medians [ranges] for tablet vs. IVRS = -0.04 [-0.16-0.22]; tablet vs paper = -0.02 [-0.11-0.14]; IVRS vs paper = 0.02 [-0.07-0.19]), and 57/81 (70 %) items had bootstrapped 95 % CI around the effect sizes within +/-0.20. The median time to complete the questionnaire by tablet was 3.4 min; IVRS: 5.8; paper: 4.0. The proportion of participants by mode who reported “no problems” responding to the questionnaire was 86 % tablet, 72 % IVRS, and 98 % paper. CONCLUSIONS: Mode equivalence of items was moderate to high, and comparable to test-retest reliability (median ICC = 0.80). Each mode was acceptable to a majority of respondents. Although the study was powered to detect moderate or larger discrepancies between modes, the observed ICCs and very small mean differences between modes provide evidence to support study designs that are responsive to patient or investigator preference for mode of administration, and justify comparison of results and pooled analyses across studies that employ different PRO-CTCAE modes of administration.

Bennett, AV et. al. Evaluation of mode equivalence of the MSKCC Bowel Function Instrument, LASA Quality of Life, and Subjective Significance Questionnaire items administered by Web, interactive voice response system (IVRS), and paper. Qual Life Res. 2015; e-pub ahead of print. PMID: 26590838.

ABSTRACT: PURPOSE: To assess the equivalence of patient-reported outcome (PRO) survey responses across Web, interactive voice response system (IVRS), and paper modes of administration. METHODS: Postoperative colorectal cancer patients with home Web/e-mail and phone were randomly assigned to one of the eight study groups: Groups 1-6 completed the survey via Web, IVRS, and paper, in one of the six possible orders; Groups 7-8 completed the survey twice, either by Web or by IVRS. The 20-item survey, including the MSKCC Bowel Function Instrument (BFI), the LASA Quality of Life (QOL) scale, and the Subjective Significance Questionnaire (SSQ) adapted to bowel function, was completed from home on consecutive days. Mode equivalence was assessed by comparison of mean scores across modes and intraclass correlation coefficients (ICCs) and was compared to the test-retest reliability of Web and IVRS. RESULTS: Of 170 patients, 157 completed at least one survey and were included in analysis. Patients had mean age 56 (SD = 11), 53 % were male, 81 % white, 53 % colon, and 47 % rectal cancer; 78 % completed all assigned surveys. Mean scores for BFI total score, BFI subscale scores, LASA QOL, and adapted SSQ varied by mode by less than one-third of a score point. ICCs across mode were: BFI total score (Web-paper = 0.96, Web-IVRS = 0.97, paper-IVRS = 0.97); BFI subscales (range = 0.88-0.98); LASA QOL (Web-paper = 0.98, Web-IVRS = 0.78, paper-IVRS = 0.80); and SSQ (Web-paper = 0.92, Web-IVRS = 0.86, paper-IVRS = 0.79). CONCLUSIONS: Mode equivalence was demonstrated for the BFI total score, BFI subscales, LASA QOL, and adapted SSQ, supporting the use of multiple modes of PRO data capture in clinical trials.

Bjorner, JB et al. Method of administration of PROMIS scales did not significantly impact score level, reliability or validity. J Clin Epidemiol. 2014; 67(1):108-113. PMID: 24262772.

ABSTRACT: OBJECTIVES: To test the impact of the method of administration (MOA) on score level, reliability, and validity of scales developed in the Patient Reported Outcomes Measurement Information System (PROMIS). STUDY DESIGN AND SETTING: Two nonoverlapping parallel forms each containing eight items from each of three PROMIS item banks (Physical Function, Fatigue, and Depression) were completed by 923 adults with chronic obstructive pulmonary disease, depression, or rheumatoid arthritis. In a randomized crossover design, subjects answered one form by interactive voice response (IVR) technology, paper questionnaire (PQ), personal digital assistant (PDA), or personal computer (PC) and a second form by PC, in the same administration. Method equivalence was evaluated through analyses of difference scores, intraclass correlations (ICCs), and convergent/discriminant validity. RESULTS: In difference score analyses, no significant mode differences were found and all confidence intervals were within the prespecified minimal important difference of 0.2 standard deviation. Parallel-forms reliabilities were very high (ICC = 0.85-0.93). Only one across-mode ICC was significantly lower than the same-mode ICC. Tests of validity showed no differential effect by MOA. Participants preferred screen interface over PQ and IVR. CONCLUSION: We found no statistically or clinically significant differences in score levels or psychometric properties of IVR, PQ, or PDA administration compared with PC.

Lundy, JJ, Coons, SJ, Aaronson, NK. Testing the measurement equivalence of paper and interactive voice response system versions of the EORTC QLQ-C30. Qual Life Res. 2014; 23(1)229-237. PMID: 23765449.

ABSTRACT: PURPOSE: The objective of this study was to evaluate the measurement equivalence of an interactive voice response system (IVRS) version and the original paper-based version of the EORTC QLQ-C30. METHODS: The QLQ-C30 is a cancer-specific, health-related quality of life questionnaire consisting of nine multi-item scales (physical, role, emotional, cognitive and social functioning, fatigue, nausea/vomiting, pain, and quality of life) and six single item measures (dyspnea, insomnia, appetite loss, constipation, diarrhea, and financial problems). This study utilized a crossover design with subjects randomly assigned to one of two assessment orders: (1) paper then IVRS or (2) IVRS then paper. Equivalence between the two administration modes was established by comparing the 95% lower confidence interval (CI) of the intraclass correlation coefficients (ICCs) for each scale, with a critical value of 0.70. RESULTS: The ICCs for the nine multi-item scales were all above 0.79, ranging from 0.791 to 0.899 (ICC 95% lower CI range 0.726-0.865) and significantly different from our threshold reliability of 0.70. The ICCs for the six single items ranged from 0.689 to 0.896 (ICC 95% lower CI range 0.611-0.888). Two of the items, insomnia and appetite loss, were not statistically different from 0.70. When considered together, the per-protocol analysis results support the equivalence of the paper and IVRS versions of the QLQ-C30 for 13 of the 15 scores. CONCLUSION: This analysis provides evidence that the scores obtained from the IVRS version of the QLQ-C30 are equivalent to those obtained with the original paper version except for the insomnia and appetite loss items.

Lundy, JJ, Coons, SJ. Measurement equivalence of interactive voice response and paper versions of the EQ-5D in a cancer patient sample. Value Health. 2011; 14(6)867-871. PMID: 21914508.

ABSTRACT: OBJECTIVE: To assess the measurement equivalence of an interactive voice response (IVR) version of the EQ-5D with the original paper version. METHODS: Subjects were randomly assigned to: 1) paper then IVR, or 2) IVR then paper and asked to complete the questionnaire two days apart. The analyses tested mean differences (repeated measures analysis of variance) and reliability (intraclass correlation coefficient [ICC]). Equivalence of the means was established if the 95% confidence interval (CI) of the mean difference was within the minimally important difference interval: -0.035 to 0.035 for the EQ-5D index and -3 to 3 for the visual analog scale (EQ VAS). ICC adequacy was tested by comparing the ICC 95% lower CI with a critical value of 0.70. RESULTS: The analyses included 113 subjects for the index and 109 subjects for the EQ VAS. For the index, the adjusted means of the paper and IVR versions were 0.789 ± 0.016 and 0.798 ± 0. 017, respectively. The 95% CI of the mean difference was -0.024 to 0.006, within the equivalence interval. The ICC was 0.894 (95% lower CI 0.857), significantly greater than 0.70. For the EQ VAS, the adjusted means were 71.94 ± 1.87 for paper and 74.63 ± 1.79 for IVR. The 95% CI of the mean difference was -4.347 to -1.049, partially within the equivalence interval. The ICC was 0.887 (95% lower CI 0.840), significantly greater than 0.70. CONCLUSIONS: The results provide evidence that the EQ-5D scores on the IVR version were sufficiently equivalent to those obtained on the paper version.

Magnus, BE et al. Mode effects between computer self-administration and telephone interviewer-administration of the PROMIS® pediatric measures, self- and proxy report. Qual Life Res. 2016; e-pub ahead of print. PMID: 26724944.

ABSTRACT: OBJECTIVE: To test equivalence of scores obtained with the PROMIS® pediatric Depressive Symptoms, Fatigue, and Mobility measures across two modes of administration: computer self-administration and telephone interviewer-administration. If mode effects are found, to estimate the magnitude and direction of the mode effects. METHODS: Respondents from an internet survey panel completed the child self-report and parent proxy-report versions of the PROMIS® pediatric Depressive Symptoms, Fatigue, and Mobility measures using both computer self-administration and telephone interviewer-administration in a crossed counterbalanced design. Pearson correlations and multivariate analysis of variance were used to examine the effects of mode of administration as well as order and form effects. RESULTS: Correlations between scores obtained with the two modes of administration were high. Scores were generally comparable across modes of administration, but there were some small significant effects involving mode of administration; significant differences in scores between the two modes ranged from 1.24 to 4.36 points. CONCLUSIONS: Scores for these pediatric PROMIS measures are generally comparable across modes of administration. Studies planning to use multiple modes (e.g., self-administration and interviewer-administration) should exercise good study design principles to minimize possible confounding effects from mixed modes.

Marcano Belisario, JS et al. Comparison of self-administered survey questionnaire responses collected using mobile apps versus other methods. Cochrane Database Syst Rev. 2015; 7:MR000042. PMID: 26212714.

*No Abstract Included*