Sample size calculation for the Wilcoxon-Mann-Whitney test adjusting for ties#
Based on [Zhao et al., 2008]
1. When to use Wilcoxon-Mann-Whitney (WMW) test [1]?#
Non-parametric test for comparing two groups of observations on a continuous or ordered categorical (ordinal) variable when there is no underlying distributional assumption imposed on the data.
Data representation
Group |
\(C_1\) |
\(C_2\) |
\(\cdots\) |
\(C_D\) |
Total |
---|---|---|---|---|---|
A |
\(m_1(p_1)\) |
\(m_2(p_2)\) |
\(\cdots\) |
\(m_D(p_D)\) |
\(m\) |
B |
\(m_1(q_1)\) |
\(m_2(q_2)\) |
\(\cdots\) |
\(m_D(q_D)\) |
\(n\) |
Total |
\(M_1\) |
\(M_2\) |
\(\cdots\) |
\(M_D\) |
\(N\) |
where
\(N\) is the total sample size; \(m\) and \(n\) are sample sizes for Group A and B, respectively
\(M_D\) is the column total for Column \(D\), which stands for distinct outcome value
\(p_i\) = \(m_i/m\), enclosed in the parenthese; \(q_i\) is defined similarly
2. Details about WMW#
WMW uses the competing probability \(\pi = Pr(X > Y) + 0.5Pr(X = Y)\) to quantify the difference between two groups under comparison, where X and Y are random variables with CDF \(F_X\) and \(F_Y\), respectively. The null hypothesis is
An unbiased estimator \(\hat{\pi}\) of the competing probability \(\pi\) is
where \(\delta(t) = 1\) if \(t> 0\), 0.5 if \(t=0\), and 0 if \(t<0\).
The variance of \(\hat{\pi}\) under the null hypothesis can be calculated as
where \(m\) and \(n\) are sample sizes for each group, \(N=m+n\) and \(M_c\) is the marginal total for a unique group defined by the outcome value. Where there is no ties, it reduces to \((N+1)/(12mn)\).
In WMW test, under null the z-statistic is construct as follows:
In general, e.g. Sigel and Castellan recommended when \(m=3\) or 4 and \(n>12\); \(m>4\) and \(n>10\), above normal approximation can be made.
3. Sample size formula#
To compute the total sample size \(N\) , we assume the treatment fraction \(t = n/N\) is known.
In addition, we assume the proportions \(p_1,\cdots, p_D\) and \(q_1,\cdots,q_D\) in above table are known.
Derived from the fact that
where \(\mu_0\), \(\sigma_0\) and \(\mu_1\), \(\sigma_1\) are the means and sd of \(\hat{\pi}\) under the null and the alternative hypothesis, and \(Z_{\alpha/2}\) is the upper \((\alpha/2)\)th quantile of the standard normal distribution.
Further assume \(\sigma_0 = \sigma_1\) along with the fact that \(\mu_0 = 0.5\), the sample size formula simplifies to
The final sample size formula for the WMW test is derived as follows:
When there is no ties, the sample size formula simplifies to
Yan D Zhao, Dewi Rahardja, and Yongming Qu. Sample size calculation for the wilcoxon–mann–whitney test adjusting for ties. Statistics in medicine, 27(3):462–468, 2008.