BostonPlummer
BostonPlummer

Reputation: 127

Bootstrapping Two Medians with "boot" function in R

I am trying to implement the Bootstrapping Two Medians procedure either in R or in Matlab, by following the procedure described in that article:

  1. Bootstrap each sample separately, creating the sampling distribution for each median.
  2. Then calculate the difference between the medians, and create the sampling distribution of those differences. This is the sampling distribution we care about.
  3. Once we have that distribution we can establish a confidence interval on that, and report the result.
  4. If the confidence interval does not include 0, we can reject the null hypothesis that there is no difference between the medians of the two conditions.

I started to get this method in R:

x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)
bx <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
by <- boot(data = y,statistic = function(y,i) median(y[i]),R = 1000)

However, I do not know how to "calculate the difference between the medians" from those "bx" and "by".

Any suggestion, in order to implement the Bootstrapping Two Medians method?

Upvotes: 0

Views: 129

Answers (2)

Rui Barradas
Rui Barradas

Reputation: 76641

In order to call boot just once, define a function that accepts both vectors and returns the difference of the medians.

library(boot)

# data recreated with pseudo-RNG seed set
# so that the results are reproducible
set.seed(2024)
x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)

# bootstrap statistic
# argument 'data' is a 2 column matrix
fboot <- function(data, i) {
  x <- data[i, 1L]
  y <- data[i, 2L]
  median(x) - median(y)
}

# set the replicates just once, makes the code simpler
R <- 1000L

# OP code
bx <- boot(data = x, statistic = function(x,i) median(x[i]), R = R)
by <- boot(data = y, statistic = function(y,i) median(y[i]), R = R)

# using the function fboot
b1 <- boot(data = cbind(x, y), statistic = fboot, R = R)

# the results are comparable to the results above
bb <- bx$t - by$t
cut(bb, pretty(bb)) |> table()
#> 
#> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5] 
#>        11        78       301       387       184        31         8
cut(b1$t, pretty(b1$t)) |> table()
#> 
#> (-4,-3.5] (-3.5,-3] (-3,-2.5] (-2.5,-2] (-2,-1.5] (-1.5,-1] (-1,-0.5] 
#>        11        76       299       385       164        57         8

# see them
old_par <- par(mfrow = c(1, 2))
hist(bb)
abline(v = mean(bb), col = "red")
hist(b1$t)
abline(v = mean(b1$t), col = "red")

par(old_par)

# final check
median(x) - median(y)
#> [1] -2.440257
mean(b1$t)
#> [1] -2.336104
mean(bb)
#> [1] -2.359999

# 95% confidence intervals
quantile(b1$t, c(0.025, 0.975))
#>      2.5%     97.5% 
#> -3.326367 -1.258051
quantile(bb, c(0.025, 0.975))
#>      2.5%     97.5% 
#> -3.335830 -1.342738

Created on 2024-09-23 with reprex v2.1.0

Upvotes: 1

BostonPlummer
BostonPlummer

Reputation: 127

Thanks to the suggestions from @Roland, I think I got the Bootstrapping Two Medians procedure:

x <- rnorm(100, mean = 10, sd = 2)
y <- rnorm(100, mean = 12, sd = 3)
# (1) Bootstrap each sample separately, creating the sampling distribution for each median
bx <- boot(data = x,statistic = function(x,i) median(x[i]),R = 1000)
by <- boot(data = y,statistic = function(y,i) median(y[i]),R = 1000)
# (2) Then calculate the difference between the medians, and create the sampling distribution of those differences.
median_differences <- by$t - bx$t
# (3) Once we have that distribution we can establish a confidence interval on that, and report the result.
CI <- quantile(median_differences, probs = c(0.025, 0.975))
# (4) If the confidence interval does not include 0, we can reject the null hypothesis that there is no difference between the medians of the two conditions.
if (any(0 >= CI[1] & 0 <= CI[2])) {
  H <- 0
  print('We fail to reject the null hypothesis that there is no difference between the medians.')
} else {
  H <- 1
  print('We reject the null hypothesis that there is no difference between the medians.')
}
print(list(H = H, CI = CI))

Resulting in:

[1] "We reject the null hypothesis that there is no difference between the medians."
> print(list(H = H, CI = CI))
$H
[1] 1

$CI
     2.5%     97.5% 
0.7656251 2.5636433 

Upvotes: 1

Related Questions