Shengping Yang PhD, Gilbert Berdine, MD
Corresponding author: Shengping Yang
Contact Information: Shengping.firstname.lastname@example.org
I am planning a clinical trial to compare two diets on reducing the risk of type II diabetes. Because there is a restriction on the total budget, I would prefer to enroll a small number of participants. Meanwhile, it is important that there is sufficient statistical power to detect a clinically meaningful difference. Is there any study design that can be utilized?
Keywords: crossover design, carryover effect, sample size
Although smaller sample size is often associated with lower statistical power, certain study designs can be utilized to achieve improved power with the same sample size. One of such designs is called the crossover design. Specifically, a crossover design is a longitudinal study design that has multiple intervention periods, with a “washout” period(s) in-between. Specifically, each subject receives a sequence of interventions in the order of intervention — “washout” — intervention . . . In nearly all crossover designs, both the number of interventions a subject receives, and the number of periods a subject participates are the same for all subjects.
There are both advantages and limitations of utilizing a crossover design. We will use the most commonly used crossover design, which is the 2 × 2 crossover design, to illustrate how it works.
There are 2 sequences, 2 periods and 2 interventions in a 2 × 2 crossover design. To implement such a design, all subjects are first randomized into one of the two sequences (Table 1). Subjects randomized to sequence 1 receive intervention A in period 1 and “crossover” to intervention B in period 2; and those randomized to sequence 2 receive intervention B in period 1 and “crossover” to intervention A in period 2. Note that there is a hallmark “washout” period between periods 1 and 2, i.e., after the completion of period 1 intervention, instead of starting period 2 intervention immediately, there will be a period of no intervention. The “washout” period is designed to affirm that the effect of the period 1 intervention is worn off, so that the effect of period 2 intervention attributes to that intervention, not period 1 intervention.
Table 1. The layout of a two-period crossover design
|Period 1||Washout period||Period 2|
The key assumption of a crossover design is that there is no carryover effect (the effect of intervention from one period, even after the “washout” period, continues to affect a subject in the subsequent period). Ideally, in a 2 × 2 crossover study, the effect of the first period disappears completely before the start of the second period. However, in reality, there are virtually always subtle differences between the two periods; for example, subjects might be more familiar with the intervention/protocol in the second period compared to the first period, and thus respond differently to the second intervention should it be received in the first period. Due to this assumption, carryover design is most useful for studying chronic and stable diseases, and is usually inappropriate for studying curable diseases. For example, if a subject is cured in the first period, then no matter what intervention this subject receives in the second period, the outcome is always cured. Therefore, applying crossover design to curable diseases invalidates the non-carryover effect assumption.
We present the cell mean model of a crossover design in Table 2. In period 1, subjects in sequence 1(2) receive intervention A(B), and thus the expected value is the summation of overall mean, effect of intervention A(B) and effect of period 1. In period 2, subjects in sequence 1(2) receive intervention B(A), and thus the expected value is the summation of overall mean, effect of intervention B(A), effect of period 2, and the carryover effect λA(λB) due to the first period intervention.
Table 2. The cell mean model
|Period 1||Period 2|
|Sequence 1||E(Yi11) = μ + τA + ρ1||E(Yi12) = μ + τB + ρ2 + λA|
|Sequence 2||E(Yi21) = μ + τB + ρ1||E(Yi22) = μ + τA + ρ2 + λB|
μ: Overall mean
i = 1, ... , ni, where ni is the number of subjects in sequence i
τA: Effect of intervention A
τB: Effect of intervention B
ρ1: Effect of period 1
ρ2: Effect of period 2
λA: Carryover effect of intervention A
λB: Carryover effect of intervention B
Like parallel-group trials, the primary goal of a crossover study is to compare the effect of interventions A and B. However, because each subject receives both interventions, a direct comparison of subjects received A with those received B is not possible, and thus the following analysis procedure should be used: (1) obtain the differences between periods 1 and 2, for subjects in both sequences. According to the cell mean model, E(Yi11 − Yi12) = τA − τB + ρ1 − ρ2 − λA, and E(Yi21 − Yi22) = τB − τA + ρ1 − ρ2 − λB. Assuming that there is no carryover effect, i.e., λA = λB, then E(Yi11 − Yi12) − E(Yi21 − Yi22) = 2τA − 2τB, which equals to the difference between interventions A and B. Furthermore, assuming that the distribution of the outcome measurement follows a normal distribution, the comparison of the two interventions can be made by comparing the two-period difference between the two sequences using a two-sample t test. In situations where a normal distribution assumption is questionable, such a comparison can be made by replacing the t test with a Mann-Whitney U test. Note that such a comparison is valid only when there is no carryover effect. Therefore, before making the above test, a test of carryover effect should be performed. Specifically, let E(Yi11 + Yi12) = 2μ + τA + τB + ρ1 + ρ2 + λA, and E(Yi21 + Yi22) = 2μ + τA + τB + ρ1 + ρ2 + λB, then E(Yi11 + Yi12) − E(Yi21 + Yi22) = λA − λB. Then, testing equal carryover effect is equivalent to compare the sum of periods 1 and 2 between the 2 sequences.
A mixed effect regression can also be used to model a crossover design, and is more flexible. However, this is not the focus of this article.
In general, each subject in a crossover study receive all intervention at least once. The advantage of this design is that comparison between interventions can be made without taking into account of confounding factors, such as gender and age, because each subject serves as his/her own control. This is highly desirable in data analysis especially when the sample size is small. Due to small number of subjects, the distributions of the confounders might be very different (after randomization) between groups in a parallel-group design, which makes it difficult to separate the true intervention effect from the confounder effect. Note that in a crossover study, because each subject serves as his/her own control, confounding effects are removed by design.
Because each subject is measured multiple times, e.g., 2 times (one in period 1 and another in period 2) in a 2 × 2 crossover design, data variance can be substantially reduced due to correlation between repeated measurements on the same subject, which can be directly translated into improved statistical power. We will demonstrate this advantage by comparing sample size required by a crossover design and a two-sample t test.
For a given type I error of α and type II error of β, assuming that both groups have the same group size, sample size for a two-sample t test can be approximated (using z approximation) by the following,
where zγ is the γth percentile of the standard normal distribution, and Δ = μ1 − μ2 is difference between the two group means, σ21 and σ22 are the variance of the two groups, respectively.
Sample size for a 2 × 2 crossover design can be calculated by using,
where σ2m is the variance in crossover design. Because of correlation between repeated measures, σ2m is usually much smaller than σ21 and σ22 (details not shown), and thus sample size required for a crossover design can be much smaller than that required for a two-sample t test, given the same type I and type II errors.
While crossover design has its advantages over parallel-group design, it can sometimes have serious limitations.
The success of a crossover design is largely dependent on how large the carryover effect is. If the carryover effect is negligible, then a crossover design has better efficiency than a parallel-group design. On the other hand, if carryover effect is strong, then it confounds the intervention effect, which prevents the intervention effect from being estimated correctly. Not surprising, testing for carryover effect is a critical step in analyzing crossover design data. However, statistical power for testing carryover effect is always much lower than that for testing intervention effect; thus a non-significant carryover effect test result does not necessarily mean that carryover effect should be ignored. Additionally, should a significant carryover effect be detected, the arguably best approach for data analysis is to perform comparison by using data collected from the first period only, which is not efficient.
In a crossover design, although the number of subjects required is much less compared to a parallel-group design, the costs associated with a crossover study might not be reduced proportionally, especially for those with a large number of repeated measurements.
Depending on the nature of a specific study, a crossover study could have long duration (considering there is also a “washout” period(s)), which might result in high dropout rate. Appropriate adjustments need to be made to make sure that enough subjects complete all interventions.
In summary, crossover design has improved efficiency compared to a parallel-group design, and is highly useful when recruitment is a challenge and/or cost associated with a trial is very high. Meanwhile, crossover design is most suitable for trials on chronic diseases, ideally with minimal carryover effect, given a sufficient “washout” period.
Article citation: Yang S, Berdine G. Crossover design. The Southwest Respiratory and Critical Care Chronicles 2019;7(30):63–66
From: Department of Biostatistics (SY), Pennington Biomedical Research Center, Baton Rouge, LA; Department of Internal Medicine (GB), Texas Tech University Health Sciences Center, Lubbock, Texas
Conflicts of interest: none
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.