Reputation: 902
I would like to perform a Sobol sensitivity analysis using the R package "sensitivity". However, I am not sure how to create the first and second random sample (X1, X2)
in the sobol
function. I assume that X1
and X2
are both a subset of the input data, however, the final results seem to be incorrect.
library(tidymodels)
library(sensitivity)
# Example data
set.seed(123)
x1 = runif(100)
x2 = runif(100)
x3 = runif(100)
y = 3 * x1 + 2 * x2 + x3 + rnorm(100)
data <- data.frame(x1, x2, x3, y)
# Split data into training and testing sets
set.seed(234)
data_split <- initial_split(data, prop = 0.8)
train_data <- training(data_split)
test_data <- testing(data_split)
# Create a linear regression model using tidymodels
lm_spec <- linear_reg() %>%
set_engine("lm") %>%
set_mode("regression")
lm_fit <- lm_spec %>%
fit(y ~ x1 + x2 + x3, data = train_data)
# Define a model function
model_function <- function(x) {
new_data <- data.frame(x1 = x[, 1], x2 = x[, 2], x3 = x[, 3])
predict(lm_fit, new_data)$`.pred`
}
# Perform Sobol sensitivity analysis
set.seed(345)
X_index1 = sample(x=1:100, size = 50, replace = FALSE)
X_index2 = c(1:length(data$x1))[-X_index1]
sobol_results <- sobol(model = model_function,
X1 = data[X_index1, -4],
X2 = data[X_index2, -4],
nboot = 1000, order = 2)
sobol_results
sobol_results
shows that the sensitivity is in the following order: x2>x1>x3. According to the function y = 3 * x1 + 2 * x2 + x3 + rnorm(100)
, "x1" should have a higher sensitivity as it is 3 * x1
. How should I correct my code? Thank you.
Upvotes: 3
Views: 1205
Reputation: 41
Your intuition is correct, but I'm not sure what is going wrong with your code. If you adapt the example in the sobol
function documentation, you get
set.seed(123)
x1 = runif(100)
x2 = runif(100)
x3 = runif(100)
y = 3 * x1 + 2 * x2 + x3 + rnorm(100)
data <- data.frame(x1, x2, x3, y)
lm_fit <- lm(y ~ x1 + x2 + x3, data = data)
model_function <- function(x) {
new_data <- data.frame(x1 = x[, 1], x2 = x[, 2], x3 = x[, 3])
predict(lm_fit, new_data)
}
n <- 1000
X1 <- data.frame(matrix(runif(3 * n), nrow = n))
X2 <- data.frame(matrix(runif(3 * n), nrow = n))
# sensitivity analysis
x <- sobol(model = model_function, X1 = X1, X2 = X2, order = 2, nboot = 100)
print(x)
Sobol indices
original bias std. error min. c.i. max. c.i.
X1 0.518462536 -0.0009910373 0.02878008 0.464823007 0.57126139
X2 0.402619755 0.0027740755 0.02861666 0.348205815 0.46152287
X3 0.085898363 0.0024478486 0.03920132 -0.003253836 0.15247817
X1*X2 -0.003176265 -0.0021151727 0.03999432 -0.069155348 0.08856589
X1*X3 -0.003176265 -0.0021151727 0.03999432 -0.069155348 0.08856589
X2*X3 -0.003176265 -0.0021151727 0.03999432 -0.069155348 0.08856589
which is the correct ordering.
Upvotes: 0