tcvdb1992
tcvdb1992

Reputation: 455

Determine the difference between the medians of two groups with 95% CI in R (not the median of the differences)

I have data on a change in a continuous outcome and the baseline score of this outcome of 49 patients. Further I've made a factor variable dividing patients in a low baseline score (Q1) or a high baseline score (Q2) based on the median baseline score. This data looks as follows:

library(boot)

mydata <-
  structure(
    list(
      ID=c(4, 13, 20, 24, 30, 34, 37, 38, 48, 49, 51, 52, 54, 58, 75, 80, 81, 82, 83, 84, 92, 95, 103, 104, 115, 
           117, 125, 127, 138, 141, 153, 160, 172, 180, 185, 197, 198, 202, 205, 213, 221, 253, 255, 258, 262, 
           271, 277, 279, 320), 
      change_continuous_outcome = c(694, 52, 1500, 195, 53, 54, -500, 2, -21, 394, -10, -38, 43, 1500, 
                                    -500, -11, 8, 149, 0, 473, 8, 797, 313, 9, 263, 1219, 68, 216, 
                                    75, 0, 95, 698, -1, 750, 168, 251, -381, 19, 70, 0, 182, 4, -28, 
                                    36, 37, 18, 3, 928, -4), 
      baseline_continuous_outcome = c(2646.8, 3112.4, 10661.6, 5706.7, 81.5, 3730.4, 196.1, 83.9, 177.3, 1976.7, 
                                      3196.8, 2007.5, 63.2, 7594.5, 3261.8, 155.2, 57.2, 11189.7, 0, 
                                      2800.8, 13.9, 3484.5, 3528.1, 3636.6, 9.1, 5681.4, 67.9, 205.4, 138.4, 
                                      3141.1, 138.5, 3795.9, 152.7, 7349.1, 2123.4, 122, 5935.8, 100.7, 
                                      2023.4, 4095.4, 2636.1, 11.9, 2241.1, 198.2, 186, 20.2, 97.7, 6709.8, 169.5), 
      q2vsq1_baseline_cont_outcome = structure(c(2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 
                                                 1L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 
                                                 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 
                                                 1L, 1L, 2L, 1L ), .Label = c("0", "1"), class = "factor")), 
    row.names = c(NA, -49L), 
    class = c("tbl_df", "tbl", "data.frame"))

I've performed a Wilcoxon Rank Sum test to compare the continuous_outcome_change variable between patients with a low baseline score and a high baseline score:

wilcox.test(mydata$change_continuous_outcome ~ mydata$q2vsq1_baseline_cont_outcome)
Wilcoxon rank sum test with continuity correction

data:  mydata$change_continuous_outcome by mydata$q2vsq1_baseline_cont_outcome
W = 201.5, p-value = 0.04995
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(x = c(53, -500, 2, -21, 394, 43, -11, 8,  :
  cannot compute exact p-value with ties

Now I am interested in computing the difference between the two medians of the groups including a 95% confidence interval. I want to use the boot function to do this, which takes two arguments: one for the data and one to index the data. So I need to write a function that indexes my data/calculates the median between the groups. Borrowing something I found elsewhere (https://data.library.virginia.edu/the-wilcoxon-rank-sum-test/) I made:

med.diff <- function(d, i) {
  mydata <- d[i,] 
  median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"]) - 
    median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"])
}

boot_result <- boot(data=mydata, statistic=med.diff, R=1000)
median(boot_result$t)
boot.ci(boot_result, type = "perc")

However this returns NA results. Is there something faulty in my formula? Or is the problem elsewhere? Thanks in advance!

Upvotes: 0

Views: 1097

Answers (1)

cazman
cazman

Reputation: 1492

From what I can tell, the error you are getting is from the line:

median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="2"])

which is NA. When your data structure for the baseline count outcome was defined, you converted it to a factor, but relabeled it. So the integers 1 and 2 look to be relabeled as 0 and 1 in the data frame. Then your search for the value of "2" in that column returns NA because it doesn't exist. If you change your function to:

med.diff <- function(d, i) {
  mydata <- d[i,] 
  median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="1"]) - 
    median(mydata$change_continuous_outcome[mydata$q2vsq1_baseline_cont_outcome=="0"])
}  

you get:

median(boot_result$t)
> 143

boot.ci(boot_result, type = "perc")
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates

CALL : 
boot.ci(boot.out = boot_result, type = "perc")

Intervals : 
Level     Percentile     
95%   ( -1.0, 579.4 )  
Calculations and Intervals on Original Scale

Upvotes: 1

Related Questions