Yang Yang
Yang Yang

Reputation: 902

How to perform a Sobol sensitivity analysis using R package "sensitivity"?

I would like to perform a Sobol sensitivity analysis using the R package "sensitivity". However, I am not sure how to create the first and second random sample (X1, X2) in the sobol function. I assume that X1 and X2 are both a subset of the input data, however, the final results seem to be incorrect.

library(tidymodels)
library(sensitivity)

# Example data
set.seed(123)
x1 = runif(100)
x2 = runif(100)
x3 = runif(100)
y = 3 * x1 + 2 * x2 + x3 + rnorm(100)
data <- data.frame(x1, x2, x3, y)

# Split data into training and testing sets
set.seed(234)
data_split <- initial_split(data, prop = 0.8)
train_data <- training(data_split)
test_data <- testing(data_split)

# Create a linear regression model using tidymodels
lm_spec <- linear_reg() %>%
  set_engine("lm") %>%
  set_mode("regression")

lm_fit <- lm_spec %>%
  fit(y ~ x1 + x2 + x3, data = train_data)

# Define a model function
model_function <- function(x) {
  new_data <- data.frame(x1 = x[, 1], x2 = x[, 2], x3 = x[, 3])
  predict(lm_fit, new_data)$`.pred`
}

# Perform Sobol sensitivity analysis
set.seed(345)
X_index1 = sample(x=1:100, size = 50, replace = FALSE)
X_index2 = c(1:length(data$x1))[-X_index1]

sobol_results <- sobol(model = model_function, 
                       X1 = data[X_index1, -4], 
                       X2 = data[X_index2, -4], 
                       nboot = 1000, order = 2)

sobol_results

sobol_results shows that the sensitivity is in the following order: x2>x1>x3. According to the function y = 3 * x1 + 2 * x2 + x3 + rnorm(100), "x1" should have a higher sensitivity as it is 3 * x1. How should I correct my code? Thank you.

enter image description here

Upvotes: 3

Views: 1205

Answers (1)

Devin F
Devin F

Reputation: 41

Your intuition is correct, but I'm not sure what is going wrong with your code. If you adapt the example in the sobol function documentation, you get

set.seed(123)
x1 = runif(100)
x2 = runif(100)
x3 = runif(100)
y = 3 * x1 + 2 * x2 + x3 + rnorm(100)
data <- data.frame(x1, x2, x3, y)

lm_fit <- lm(y ~ x1 + x2 + x3, data = data)
model_function <- function(x) {
  new_data <- data.frame(x1 = x[, 1], x2 = x[, 2], x3 = x[, 3])
  predict(lm_fit, new_data)
}

n <- 1000
X1 <- data.frame(matrix(runif(3 * n), nrow = n))
X2 <- data.frame(matrix(runif(3 * n), nrow = n))

# sensitivity analysis
x <- sobol(model = model_function, X1 = X1, X2 = X2, order = 2, nboot = 100)
print(x)
Sobol indices
          original          bias std. error    min. c.i.  max. c.i.
X1     0.518462536 -0.0009910373 0.02878008  0.464823007 0.57126139
X2     0.402619755  0.0027740755 0.02861666  0.348205815 0.46152287
X3     0.085898363  0.0024478486 0.03920132 -0.003253836 0.15247817
X1*X2 -0.003176265 -0.0021151727 0.03999432 -0.069155348 0.08856589
X1*X3 -0.003176265 -0.0021151727 0.03999432 -0.069155348 0.08856589
X2*X3 -0.003176265 -0.0021151727 0.03999432 -0.069155348 0.08856589

which is the correct ordering.

Upvotes: 0

Related Questions