Reputation: 41
Good evening R experts,
I am generating a Spearman's correlation matrix in R using all variables under study for my current project. To be consistent with the other analyses I am conducting, I would like to construct BCa bootstrapped confidence intervals using 10,000 iterations around each Spearman's coefficient. However, I have not found an efficient and reliable method by which to achieve this end.
I have created my Spearman's correlation matrix with coefficients and p-values using the following code. Note that I have generated random data for the sake of this example.
library(psych)
library(rstatix)
df_test <- data.frame(
recall=c(45, 1, 32, 17, 79, 15, 75, 100, 43, 80, 74, 91, 60, 54, 67, 26, 97, 53, 51, 30),
recog=c(90, 29, 93, 73, 34, 68, 78, 56, 92, 85, 35, 81, 7, 58, 4, 52, 82, 31, 6, 23),
hits=c(77, 89, 44, 8, 70, 96, 76, 62, 95, 27, 49, 12, 16, 28, 83, 2, 36, 10, 61, 86),
misses=c(59, 78, 14, 44, 86, 61, 80, 72, 25, 93, 5, 42, 64, 95, 73, 54, 7, 67, 11, 53),
false_alarms=c(32, 34, 55, 41, 77, 76, 89, 36, 12, 100, 70, 62, 47, 81, 90, 63, 13, 83, 79, 38)
)
df_test.cor <- cor(df_test, y = NULL, use = "pairwise.complete.obs",
method = c("spearman"))
#Get coefficients
df_test.mat <- cor_mat(
df_test,
vars = NULL,
method = "spearman",
alternative = "two.sided",
conf.level = 0.95
)
df_test.mat
#Get sig values
df_test.pmat <- cor_pmat(
df_test,
vars = NULL,
method = "spearman",
alternative = "two.sided",
conf.level = 0.95
)
df_test.pmat
ggcorrplot(df_test.mat,
type = "lower", method = "circle", colors = c("turquoise3", "khaki1", "violetred3"), p.mat = df_test.pmat)
cor.ci(df_test, keys = NULL, n.iter = 10000, p = 0.050, poly = FALSE, method = "spearman")
The results are as follows:
1. Correlation coefficients with p-values:
2. Visualization
3. Bootstrapped 95% CI's Using psych
Package
There are a number of issues with these confidence intervals, enumerated below:
cor_pmat()
command from the rstatix
package.Hence, my question is: How might I construct bootstrapped confidence intervals around Spearman's coefficients in my entire correlation matrix using the BCa approach with precision up to 4 decimal places?
Upvotes: 3
Views: 371
Reputation: 1731
I arrived at a very similar solution to Mikko Marttila's, though I still think there's no need to fixate so much on the BCa interval. I ran a simulation on the standard bivariate normal with correlation 0.5, for which the Spearman rank correlation is also known, and determined coverage properties of the most common bootstrap confidence intervals:
## Using bivariate normal sampling distribution
rho <- .5
sigma <- matrix(rho, ncol=2, nrow=2)
diag(sigma) <- 1
## True Spearman rank correlation + coverage function
GROUND_TRUTH <- (6/pi)*asin(rho/2)
covers <- function(ci) (ci[1] <= GROUND_TRUTH) & (GROUND_TRUTH <= ci[2])
## Function to bootstrap a single Spearman correlation
bootfun <- function(data, i) {
r <- cor(data[i,], method = "spearman")
r[lower.tri(r)]
}
## Generate one sample & bootstrap it
## Returns whether the obtained intervals cover the true parameter
one_boot <- function(seed, n=100) {
set.seed(seed)
data <- mvtnorm::rmvnorm(n, sigma = sigma)
b <- boot::boot(data, bootfun, R=1E4)
ci <- boot::boot.ci(b, type = c("norm", "basic", "perc", "bca"))
c("norm" = covers(ci$normal[2:3]), "basic"= covers(ci$basic[4:5]),
"perc" = covers(ci$percent[4:5]), "bca" = covers(ci$bca[4:5]))
}
## Generate many samples & bootstrap each of them
## Returns coverage properties of each interval
future::plan("multisession", workers=4)
many_boots <- future.apply::future_vapply(seq_len(1E4), one_boot,
logical(4), future.seed = TRUE) |>
matrixStats::rowMeans2()
#> norm basic perc bca
#> 0.9399 0.9297 0.9535 0.9544
As you can see, the raw percentiles have at least as good coverage properties as the BCa interval, and this is under the theoretically perfect standard normal (where the acceleration factor is also exact). If your sampling distribution has less well-defined skewness or isn't smooth this interval will perform worse than other alternatives.
As a bonus, here's how you can obtain a single two-sided P-value for Spearman correlation through resampling (under the null!):
permute_spearman <- function(x, y, iters=1E4) {
t0 <- cor(x, y, method="spearman")
mean(vapply(seq_len(iters), \(d) {
abs(cor(x, y[sample(seq_along(y))], method="spearman")) > abs(t0)
}, logical(1)))
}
Upvotes: 1
Reputation: 11898
You can calculate BCa intervals with the boot package.
First, create a statistic function for the Spearman correlations for a subset i
of the data:
spearman <- function(d, i) {
rho <- cor(d[i, ], method = "spearman")
rho[lower.tri(rho)]
}
Then, get bootstrap samples with boot()
:
set.seed(42)
boot_out <- boot::boot(df_test, spearman, R = 10000)
boot_out
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot::boot(data = df_test, statistic = spearman, R = 10000)
#>
#>
#> Bootstrap Statistics :
#> original bias std. error
#> t1* 0.09172932 -0.004161858 0.2181451
#> t2* -0.22105263 0.008094763 0.2623494
#> t3* 0.13684211 -0.009041776 0.2462778
#> t4* 0.22406015 -0.004986228 0.2420079
#> t5* -0.05413534 0.003039217 0.2235030
#> t6* -0.18345865 0.006896525 0.2341026
#> t7* -0.29022556 0.006665581 0.2469688
#> t8* 0.08872180 -0.004816326 0.1924127
#> t9* -0.20902256 0.011034138 0.2315252
#> t10* 0.48120301 -0.021141040 0.2029307
Finally, calculate confidence intervals with boot.ci()
. index
specifies
which element of the output statistic you want the intervals for:
boot::boot.ci(boot_out, type = "bca", index = 10)
#> BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
#> Based on 10000 bootstrap replicates
#>
#> CALL :
#> boot::boot.ci(boot.out = boot_out, type = "bca", index = 10)
#>
#> Intervals :
#> Level BCa
#> 95% (-0.0322, 0.7649 )
#> Calculations and Intervals on Original Scale
A vector index
has a different meaning (the 2nd element is interpreted as giving the variance of the 1st), so to get all CIs
you’ll need to explicitly iterate, for example with lapply()
:
lapply(seq_along(boot_out$t0), function(i) {
boot::boot.ci(boot_out, type = "bca", index = i)
})
If you want only the intervals from the output, you can extract them from the $bca
element:
sapply(seq_along(boot_out$t0), function(i) {
boot::boot.ci(boot_out, type = "bca", index = i)$bca[, 4:5]
})
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> -0.3439443 -0.6648910 -0.3743482 -0.3385006 -0.4651028 -0.6057627 -0.7137479
#> 0.5040802 0.3668189 0.5936317 0.6228635 0.4083453 0.3079268 0.2545514
#> [,8] [,9] [,10]
#> -0.2991324 -0.6217852 -0.0321938
#> 0.4506211 0.2709281 0.7648544
Upvotes: 0