Reputation: 127
I am trying to implement the Bootstrapping Two Medians procedure either in R or in Matlab, by following the procedure described in that article:
I started to get this method in R:
x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)
bx <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
by <- boot(data = y,statistic = function(y,i) median(y[i]),R = 1000)
However, I do not know how to "calculate the difference between the medians" from those "bx" and "by".
Any suggestion, in order to implement the Bootstrapping Two Medians method?
Upvotes: 0
Views: 129
Reputation: 76641
In order to call boot
just once, define a function that accepts both vectors and returns the difference of the medians.
library(boot)
# data recreated with pseudo-RNG seed set
# so that the results are reproducible
set.seed(2024)
x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)
# bootstrap statistic
# argument 'data' is a 2 column matrix
fboot <- function(data, i) {
x <- data[i, 1L]
y <- data[i, 2L]
median(x) - median(y)
}
# set the replicates just once, makes the code simpler
R <- 1000L
# OP code
bx <- boot(data = x, statistic = function(x,i) median(x[i]), R = R)
by <- boot(data = y, statistic = function(y,i) median(y[i]), R = R)
# using the function fboot
b1 <- boot(data = cbind(x, y), statistic = fboot, R = R)
# the results are comparable to the results above
bb <- bx$t - by$t
cut(bb, pretty(bb)) |> table()
#>
#> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5]
#> 11 78 301 387 184 31 8
cut(b1$t, pretty(b1$t)) |> table()
#>
#> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5]
#> 11 76 299 385 164 57 8
# see them
old_par <- par(mfrow = c(1, 2))
hist(bb)
abline(v = mean(bb), col = "red")
hist(b1$t)
abline(v = mean(b1$t), col = "red")
par(old_par)
# final check
median(x) - median(y)
#> [1] -2.440257
mean(b1$t)
#> [1] -2.336104
mean(bb)
#> [1] -2.359999
# 95% confidence intervals
quantile(b1$t, c(0.025, 0.975))
#> 2.5% 97.5%
#> -3.326367 -1.258051
quantile(bb, c(0.025, 0.975))
#> 2.5% 97.5%
#> -3.335830 -1.342738
Created on 2024-09-23 with reprex v2.1.0
Upvotes: 1
Reputation: 127
Thanks to the suggestions from @Roland, I think I got the Bootstrapping Two Medians procedure:
x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)
# (1) Bootstrap each sample separately, creating the sampling distribution for each median
bx <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
by <- boot(data = y,statistic = function(y,i) median(y[i]),R = 1000)
# (2) Then calculate the difference between the medians, and create the sampling distribution of those differences.
median_differences <- by$t - bx$t
# (3) Once we have that distribution we can establish a confidence interval on that, and report the result.
CI <- quantile(median_differences, probs = c(0.025, 0.975))
# (4) If the confidence interval does not include 0, we can reject the null hypothesis that there is no difference between the medians of the two conditions.
if (any(0 >= CI[1] & 0 <= CI[2])) {
H <- 0
print('We fail to reject the null hypothesis that there is no difference between the medians.')
} else {
H <- 1
print('We reject the null hypothesis that there is no difference between the medians.')
}
print(list(H = H, CI = CI))
Resulting in:
[1] "We reject the null hypothesis that there is no difference between the medians."
> print(list(H = H, CI = CI))
$H
[1] 1
$CI
2.5% 97.5%
0.7656251 2.5636433
Upvotes: 1