Bootstrapping Two Medians with "boot" function in R

Question

I am trying to implement the Bootstrapping Two Medians procedure either in R or in Matlab, by following the procedure described in that article:

Bootstrap each sample separately, creating the sampling distribution for each median.
Then calculate the difference between the medians, and create the sampling distribution of those differences. This is the sampling distribution we care about.
Once we have that distribution we can establish a confidence interval on that, and report the result.
If the confidence interval does not include 0, we can reject the null hypothesis that there is no difference between the medians of the two conditions.

I started to get this method in R:

x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)
bx <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
by <- boot(data = y,statistic = function(y,i) median(y[i]),R = 1000)

However, I do not know how to "calculate the difference between the medians" from those "bx" and "by".

Any suggestion, in order to implement the Bootstrapping Two Medians method?

Rui Barradas · Accepted Answer

In order to call boot just once, define a function that accepts both vectors and returns the difference of the medians.

library(boot)

# data recreated with pseudo-RNG seed set
# so that the results are reproducible
set.seed(2024)
x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)

# bootstrap statistic
# argument 'data' is a 2 column matrix
fboot <- function(data, i) {
  x <- data[i, 1L]
  y <- data[i, 2L]
  median(x) - median(y)
}

# set the replicates just once, makes the code simpler
R <- 1000L

# OP code
bx <- boot(data = x, statistic = function(x,i) median(x[i]), R = R)
by <- boot(data = y, statistic = function(y,i) median(y[i]), R = R)

# using the function fboot
b1 <- boot(data = cbind(x, y), statistic = fboot, R = R)

# the results are comparable to the results above
bb <- bx$t - by$t
cut(bb, pretty(bb)) |> table()
#> 
#> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5] 
#>        11        78       301       387       184        31         8
cut(b1$t, pretty(b1$t)) |> table()
#> 
#> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5] 
#>        11        76       299       385       164        57         8

# see them
old_par <- par(mfrow = c(1, 2))
hist(bb)
abline(v = mean(bb), col = "red")
hist(b1$t)
abline(v = mean(b1$t), col = "red")

par(old_par)

# final check
median(x) - median(y)
#> [1] -2.440257
mean(b1$t)
#> [1] -2.336104
mean(bb)
#> [1] -2.359999

# 95% confidence intervals
quantile(b1$t, c(0.025, 0.975))
#>      2.5%     97.5% 
#> -3.326367 -1.258051
quantile(bb, c(0.025, 0.975))
#>      2.5%     97.5% 
#> -3.335830 -1.342738

^{Created on 2024-09-23 with reprex v2.1.0}

Bootstrapping Two Medians with "boot" function in R

Answers (2)

Related Questions

Bootstrapping Two Medians with &quot;boot&quot; function in R

Answers (2)

Related Questions

Bootstrapping Two Medians with "boot" function in R