Linda Jasmine Hoffman
Linda Jasmine Hoffman

Reputation: 41

How Can I Get Bootstrapped BCa Confidence Intervals for Spearman's Correlation Matrix in R?

Good evening R experts,

I am generating a Spearman's correlation matrix in R using all variables under study for my current project. To be consistent with the other analyses I am conducting, I would like to construct BCa bootstrapped confidence intervals using 10,000 iterations around each Spearman's coefficient. However, I have not found an efficient and reliable method by which to achieve this end.

I have created my Spearman's correlation matrix with coefficients and p-values using the following code. Note that I have generated random data for the sake of this example.

library(psych)
library(rstatix)

df_test <- data.frame(
    recall=c(45, 1, 32, 17, 79, 15, 75, 100, 43, 80, 74, 91, 60, 54, 67, 26, 97, 53, 51, 30),
    recog=c(90, 29, 93, 73, 34, 68, 78, 56, 92, 85, 35, 81, 7, 58, 4, 52, 82, 31, 6, 23),
    hits=c(77, 89, 44, 8, 70, 96, 76, 62, 95, 27, 49, 12, 16, 28, 83, 2, 36, 10, 61, 86),
    misses=c(59, 78, 14, 44, 86, 61, 80, 72, 25, 93, 5, 42, 64, 95, 73, 54, 7, 67, 11, 53),
    false_alarms=c(32, 34, 55, 41, 77, 76, 89, 36, 12, 100, 70, 62, 47, 81, 90, 63, 13, 83, 79, 38)
)

df_test.cor <- cor(df_test, y = NULL, use = "pairwise.complete.obs",
    method = c("spearman"))

#Get coefficients
df_test.mat <- cor_mat(
  df_test,
  vars = NULL,
  method = "spearman",
  alternative = "two.sided",
  conf.level = 0.95
)

df_test.mat

#Get sig values
df_test.pmat <- cor_pmat(
  df_test,
  vars = NULL,
  method = "spearman",
  alternative = "two.sided",
  conf.level = 0.95
)

df_test.pmat

ggcorrplot(df_test.mat,
           type = "lower", method = "circle", colors = c("turquoise3", "khaki1", "violetred3"), p.mat = df_test.pmat)


cor.ci(df_test, keys = NULL, n.iter = 10000,  p = 0.050, poly = FALSE, method = "spearman")

The results are as follows:

1. Correlation coefficients with p-values: enter image description here

2. Visualization

enter image description here

3. Bootstrapped 95% CI's Using psych Package enter image description here

There are a number of issues with these confidence intervals, enumerated below:

  1. First, the p-values from the CI calculation do not correspond to those generated using the cor_pmat() command from the rstatix package.
  2. The CI approach used here does not allow me to select BCa as the bootstrapping method.
  3. The p-values and CI boundaries are only reported up to 2 decimal places using the aforementioned approach. I require greater precision for my project, ideally with p-values and CI boundaries calculated up to 3 or 4 decimals.

Hence, my question is: How might I construct bootstrapped confidence intervals around Spearman's coefficients in my entire correlation matrix using the BCa approach with precision up to 4 decimal places?

Upvotes: 3

Views: 371

Answers (2)

PBulls
PBulls

Reputation: 1731

I arrived at a very similar solution to Mikko Marttila's, though I still think there's no need to fixate so much on the BCa interval. I ran a simulation on the standard bivariate normal with correlation 0.5, for which the Spearman rank correlation is also known, and determined coverage properties of the most common bootstrap confidence intervals:

## Using bivariate normal sampling distribution
rho <- .5
sigma <- matrix(rho, ncol=2, nrow=2)
diag(sigma) <- 1

## True Spearman rank correlation + coverage function
GROUND_TRUTH <- (6/pi)*asin(rho/2)
covers <- function(ci) (ci[1] <= GROUND_TRUTH) & (GROUND_TRUTH <= ci[2])

## Function to bootstrap a single Spearman correlation
bootfun <- function(data, i) {
   r <- cor(data[i,], method = "spearman")
   r[lower.tri(r)]
}

## Generate one sample & bootstrap it
## Returns whether the obtained intervals cover the true parameter
one_boot <- function(seed, n=100) {
   set.seed(seed)
   data <- mvtnorm::rmvnorm(n, sigma = sigma)
   b <- boot::boot(data, bootfun, R=1E4)
   ci <- boot::boot.ci(b, type = c("norm", "basic", "perc", "bca"))
   c("norm" = covers(ci$normal[2:3]),  "basic"= covers(ci$basic[4:5]),
     "perc" = covers(ci$percent[4:5]), "bca"  = covers(ci$bca[4:5]))
}

## Generate many samples & bootstrap each of them
## Returns coverage properties of each interval
future::plan("multisession", workers=4)
many_boots <- future.apply::future_vapply(seq_len(1E4), one_boot,
                                          logical(4), future.seed = TRUE) |>
   matrixStats::rowMeans2()
#>   norm  basic   perc    bca 
#> 0.9399 0.9297 0.9535 0.9544 

As you can see, the raw percentiles have at least as good coverage properties as the BCa interval, and this is under the theoretically perfect standard normal (where the acceleration factor is also exact). If your sampling distribution has less well-defined skewness or isn't smooth this interval will perform worse than other alternatives.

As a bonus, here's how you can obtain a single two-sided P-value for Spearman correlation through resampling (under the null!):

permute_spearman <- function(x, y, iters=1E4) {
   t0 <- cor(x, y, method="spearman")
   mean(vapply(seq_len(iters), \(d) {
      abs(cor(x, y[sample(seq_along(y))], method="spearman")) > abs(t0)
   }, logical(1)))
}

Upvotes: 1

Mikko Marttila
Mikko Marttila

Reputation: 11898

You can calculate BCa intervals with the boot package.

First, create a statistic function for the Spearman correlations for a subset i of the data:

spearman <- function(d, i) {
  rho <- cor(d[i, ], method = "spearman")
  rho[lower.tri(rho)]
}

Then, get bootstrap samples with boot():

set.seed(42)
boot_out <- boot::boot(df_test, spearman, R = 10000)
boot_out
#> 
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot::boot(data = df_test, statistic = spearman, R = 10000)
#> 
#> 
#> Bootstrap Statistics :
#>         original       bias    std. error
#> t1*   0.09172932 -0.004161858   0.2181451
#> t2*  -0.22105263  0.008094763   0.2623494
#> t3*   0.13684211 -0.009041776   0.2462778
#> t4*   0.22406015 -0.004986228   0.2420079
#> t5*  -0.05413534  0.003039217   0.2235030
#> t6*  -0.18345865  0.006896525   0.2341026
#> t7*  -0.29022556  0.006665581   0.2469688
#> t8*   0.08872180 -0.004816326   0.1924127
#> t9*  -0.20902256  0.011034138   0.2315252
#> t10*  0.48120301 -0.021141040   0.2029307

Finally, calculate confidence intervals with boot.ci(). index specifies which element of the output statistic you want the intervals for:

boot::boot.ci(boot_out, type = "bca", index = 10)
#> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#> Based on 10000 bootstrap replicates
#> 
#> CALL : 
#> boot::boot.ci(boot.out = boot_out, type = "bca", index = 10)
#> 
#> Intervals : 
#> Level       BCa          
#> 95%   (-0.0322,  0.7649 )  
#> Calculations and Intervals on Original Scale

A vector index has a different meaning (the 2nd element is interpreted as giving the variance of the 1st), so to get all CIs you’ll need to explicitly iterate, for example with lapply():

lapply(seq_along(boot_out$t0), function(i) {
  boot::boot.ci(boot_out, type = "bca", index = i)
})

If you want only the intervals from the output, you can extract them from the $bca element:

sapply(seq_along(boot_out$t0), function(i) {
  boot::boot.ci(boot_out, type = "bca", index = i)$bca[, 4:5]
})
#>        [,1]       [,2]       [,3]       [,4]       [,5]       [,6]       [,7]
#>  -0.3439443 -0.6648910 -0.3743482 -0.3385006 -0.4651028 -0.6057627 -0.7137479
#>   0.5040802  0.3668189  0.5936317  0.6228635  0.4083453  0.3079268  0.2545514
#>        [,8]       [,9]      [,10]
#>  -0.2991324 -0.6217852 -0.0321938
#>   0.4506211  0.2709281  0.7648544

Upvotes: 0

Related Questions