Z B
Z B

Reputation: 131

How to code the permutation equivalent of Mood's Median Test in R? (get the p values using permutation)

I can do it for the two sample t test but not for Median test or Wilcoxon test or Hodges Lehmann test

data_2000 <- c(500,450,600,700,550,551,552)

data_2019 <- c(560,460,620,720,540,600,750)

mean(data_2000)

mean(data_2019)

mean(data_2019) - mean(data_2000)

combined_data <- c(data_2000, data_2019)

set.seed(123)

null_dist <- c()
for (i in 1:100000) {
  shuffled_data <- sample(combined_data)  
  shuffled_2000 <- shuffled_data[1:7] 
  shuffled_2019 <- shuffled_data[8:14]  
  null_dist[i] <- mean(shuffled_2019) - mean(shuffled_2000)
}

(p_value <- (sum(null_dist >= 49.57143) + sum(null_dist <= 
 `enter code here`-49.57143))/length(null_dist))

Upvotes: 2

Views: 235

Answers (2)

Chuck P
Chuck P

Reputation: 3923

I think this is what you're trying to do. I altered your code as little as possible. There are packages like infer that will do this for you and the for loop is not the most efficient but it's plenty good enough and may help you learn. As long as we're looping I did mean and median at the same time since all other parts of the code are identical. ifelse is a nice easy way to make 1s and 0s to sum.


data_2000 <- c(500,450,600,700,550,551,552)
data_2019 <- c(560,460,620,720,540,600,750)

delta_mean <- mean(data_2019) - mean(data_2000)
delta_median <- median(data_2019) - median(data_2000)

combined_data <- c(data_2000, data_2019)

trials <- 100000

set.seed(123)

mean_diff <- c()
median_diff <- c()

for (i in 1:trials) {
   shuffled_data <- sample(combined_data)  
   shuffled_2000 <- shuffled_data[1:7] 
   shuffled_2019 <- shuffled_data[8:14]  
   mean_diff[i] <- mean(shuffled_2019) - mean(shuffled_2000)
   median_diff[i] <- median(shuffled_2019) - median(shuffled_2000)
}

p_mean <- sum(ifelse(mean_diff > delta_mean | mean_diff < -1 * delta_mean, 1, 0)) / trials
p_median <- sum(ifelse(median_diff > delta_median | median_diff < -1 * delta_median, 1, 0)) / trials

p_mean
#> [1] 0.31888
p_median
#> [1] 0.24446

Following up on your question about HL test. Quoting Wikipedia

The Hodges–Lehmann statistic also estimates the difference between two populations. For two sets of data with m and n observations, the set of two-element sets made of them is their Cartesian product, which contains m × n pairs of points (one from each set); each such pair defines one difference of values. The Hodges–Lehmann statistic is the median of the m × n differences.

You could run it on your data with the following code...

Do NOT run it 100,000 times the answer is the same everytime because you're already making all 49 possible pairings

hl_df <- expand.grid(data_2019, data_2000)
hl_df$pair_diffs <- hl_df$Var1 - hl_df$Var2
median(hl_df$pair_diffs)
[1] 49

Upvotes: 2

Ben Norris
Ben Norris

Reputation: 5747

You can do the Wilcoxon test with wilcox.test in the stats package (loaded by default as part of R core). You need to set exact = FALSE because an exact p-value is not possible if there are ties.

wilcox.test(data_2019, data_2000, exact = FALSE)
    Wilcoxon rank sum test with continuity correction

data:  data_2019 and data_2000
W = 33.5, p-value = 0.2769
alternative hypothesis: true location shift is not equal to 0

I'll update this when I figure out how to do the other tests.

Upvotes: 0

Related Questions