Reputation: 21
I'm working with the gam function from the mgcv package in R, and I have a custom function that uses a weights vector to fit the model. My issue is that when I pass a weights vector to the function as an argument, the gam function doesn't properly recognize it unless the vector is explicitly named in the environment.
Here's my function:
check_mod <- function(formula, dist, data, weights_vector = NULL, validation_data = NULL) {
if (!is.null(weights_vector)) {
model <- gam(formula, family = dist, data = data, method = "ML", weights = weights_vector)
} else {
model <- gam(formula, family = dist, data = data, method = "ML")
}
prediction_data <- if (!is.null(validation_data)) validation_data else data
predicted <- predict(model, newdata = prediction_data)
rmse <- calc_rmse(prediction_data$Chl, predicted)
model_summary <- summary(model)
deviance_explained <- model_summary$dev.expl
degrees_of_freedom <- sum(model_summary$edf)
ML <- model_summary[["sp.criterion"]][["ML"]]
return(list(
aic = AIC(model),
rmse = rmse,
dev_expl = deviance_explained,
edf = degrees_of_freedom,
ML = ML,
model = model
))
}
This does not work unless a vector called "weights_vector" is already created and present in the environment(and therefore functiion ignores "weights1".
res <- check_mod(ind_formula, distributions[[dist_name]], CHL_df, weights_vector = weights1, validation_data = chl_validation)
Error in eval(extras, data, env) : object 'weights_vector' not found
The problem is that the weights vector weights_vector needs to be explicitly defined in the environment, and if I pass it under a different name, the function doesn't recognize it properly. Is there a way to ensure that the weights vector is correctly passed as an argument to the gam function without relying on its name in the environment?
I tried both creating the variable within the dataframe and outside as a vector:
# Load required package
library(mgcv)
# Create a simple dataset
set.seed(123)
data_example <- data.frame(
x = rnorm(100),
y = rnorm(100),
Chl = rnorm(100)
)
# Create a weights vector separately
weights1 <- runif(100, 0.5, 1.5)
# Add weights directly into the dataframe
data_example$weights_col <- weights1
# Define the custom function
check_mod <- function(formula, dist, data, weights_vector = NULL, validation_data = NULL) {
if (!is.null(weights_vector)) {
model <- gam(formula, family = dist, data = data, method = "ML", weights = weights_vector)
} else {
model <- gam(formula, family = dist, data = data, method = "ML")
}
prediction_data <- if (!is.null(validation_data)) validation_data else data
predicted <- predict(model, newdata = prediction_data)
# Calculate Root Mean Squared Error (RMSE)
rmse <- sqrt(mean((prediction_data$Chl - predicted)^2))
model_summary <- summary(model)
deviance_explained <- model_summary$dev.expl
degrees_of_freedom <- sum(model_summary$edf)
ML <- model_summary[["sp.criterion"]][["ML"]]
return(list(
aic = AIC(model),
rmse = rmse,
dev_expl = deviance_explained,
edf = degrees_of_freedom,
ML = ML,
model = model
))
}
# Define the formula and distribution for the GAM model
formula_example <- Chl ~ s(x) + s(y)
distribution_example <- gaussian()
# Attempt to run the function with the weights vector
# This approach does not work as expected
res1 <- check_mod(formula_example, distribution_example, data_example, weights_vector = weights1)
# Attempt to run the function using the weights column within the dataframe
# This approach also does not work as expected
res2 <- check_mod(formula_example, distribution_example, data_example, weights_vector = data_example$weights_col)
# Print the results
print(res1)
print(res2)
Upvotes: 2
Views: 45