How can I check for convergence in Sobol' sensitivity indices, using sensobol?

Question

I would like to check the convergence of Sobol' sensitivity indices, using the sensobol library, by re-computing the sensitivity indices using sub-samples of decreasing size extracted from the original sample.

Here, I present an example code using the Ishigami function as model. Since computing the model output takes very long with the model I actually use, I want to avoid recomputing the model output for different sample sizes, but want to use sub-samples of my overall sample for this check.

I have written code that runs through, however, it seems that the result is 'not correct', as soon as the sample size is not equal the initial sample size.

Inital set-up

library(sensobol)

# Define settings
matrices <- c("A", "B", "AB", "BA")
N <- 1000
params <- paste("X", 1:3, sep = "")
first <- total <- "azzini"
order <- "first"
R <- 10
type <- "percent"
conf <- 0.95                                                                    

# Create sample matrix using Sobol' (1967) quasi-random numbers
mat <- sobol_matrices(matrices = matrices, N = N, params = params, order = order, type = "QRN")   

# Compute model output using Ishigami function as model
Y <- ishigami_Fun(mat)

Correct Sobol' indices as benchmark result

# Compute and bootstrap Sobol' indices for entire sample N
ind <- sobol_indices(matrices = c("A", "B", "AB", "BA"),
                     Y = Y, 
                     N = N, 
                     params = params, 
                     boot = TRUE, 
                     first = "azzini",
                     total = "azzini",
                     order = "first",
                     R = R, 
                     type = type, 
                     conf = conf)
cols <- colnames(ind)[1:length(params)]
ind[ , (cols):= round(.SD, 3), .SDcols = (cols)]

Check for convergence

Now, to analyze whether convergence is reached, I want to re-compute the sensitivity indices using sub-samples of decreasing size extracted from the original sample

# function to compute sensitivity indices, depending on the sample size and the model output vector
fct_conv <- function(N, Y) {
  # compute how many model runs are performed in the case of the Azzini estimator
  nr_model_runs <- 2*N*(length(params)+1)   # length(params) = k
  
  # extract sub-sample of model output
  y_sub <- Y[1:nr_model_runs]
  
  # compute and bootstrap Sobol' indices
  ind_sub <- sobol_indices(matrices = c("A", "B", "AB", "BA"),
                           Y = y_sub, 
                           N = N, 
                           params = params, 
                           boot = TRUE, 
                           first = "azzini",
                           total = "azzini",
                           order = "first",
                           R = R, 
                           type = type, 
                           conf = conf)
  cols <- colnames(ind_sub)[1:length(params)]
  ind_sub[ , (cols):= round(.SD, 3), .SDcols = (cols)]
  
  return(ind_sub)
}

Let's compare the benchmark result (ind) to two other outputs: Running fct_conv with the full sample (ind_full_sample) and running fct_conv with a very slightly reduced sample (ind_red_sample).

ind_full_sample <- fct_conv(1000, Y)
ind_red_sample  <- fct_conv(999, Y)

ind
ind_full_sample
ind_red_sample

It seems that as soon as the sample size is reduced, the result doesn't make sense. Why is that? I'd be glad for any hints or ideas!

Arnald Puy · Accepted Answer

The results do not make sense because you are sampling without considering the ordering of the sample matrix. Try the following

# Load the required packages:
library(sensobol) 
library(data.table)
library(ggplot2)

# Create function to swiftly check convergence (you do not need bootstrap)

sobol_convergence <- function(Y, N, sample.size, seed = 666) {
  dt <- data.table(matrix(Y, nrow = N))
  set.seed(seed) # To permit replication
  subsample <- unlist(dt[sample(.N, sample.size)], use.names = FALSE)
  ind <- sobol_indices(matrices = matrices,
                       Y = subsample, 
                       N = sample.size, 
                       params = params, 
                       first = first,
                       total = total,
                       order = order)
  return(ind)
}

# Define sequence of sub-samples at which you want to check convergence

sample.size <- seq(100, 1000, 50) # every 50

# Run function

ind.list <- lapply(sample.size, function(n) 
  sobol_convergence(Y = Y, N = N, sample.size = n))

# Extract total number of model runs C and results in each run

Cost <- indices <- list()
for(i in 1:length(ind.list)) {
  Cost[[i]] <- ind.list[[i]]$C
  indices[[i]] <- ind.list[[i]]$results
}

names(indices) <- Cost

# Final dataset

final.dt <- rbindlist(indices, idcol = "Cost")[, Cost:= as.numeric(Cost)]

# Plot results 

ggplot(final.dt, aes(Cost, original, color = sensitivity)) +
  geom_line() +
  labs(x = "Total number of model runs", y = "Sobol' indices") +
  facet_wrap(~parameters) + 
  theme_bw()

How can I check for convergence in Sobol' sensitivity indices, using sensobol?

Answers (1)

Related Questions

How can I check for convergence in Sobol&#39; sensitivity indices, using sensobol?

Answers (1)

Related Questions

How can I check for convergence in Sobol' sensitivity indices, using sensobol?