Paco Herrera
Paco Herrera

Reputation: 21

How to pass an explicit weights vector to a gam function in R

I'm working with the gam function from the mgcv package in R, and I have a custom function that uses a weights vector to fit the model. My issue is that when I pass a weights vector to the function as an argument, the gam function doesn't properly recognize it unless the vector is explicitly named in the environment.

Here's my function:

check_mod <- function(formula, dist, data, weights_vector = NULL, validation_data = NULL) {
  
  if (!is.null(weights_vector)) {
    model <- gam(formula, family = dist, data = data, method = "ML", weights = weights_vector) 
  } else {
    model <- gam(formula, family = dist, data = data, method = "ML")
  }
  
  prediction_data <- if (!is.null(validation_data)) validation_data else data
  
  predicted <- predict(model, newdata = prediction_data)
  rmse <- calc_rmse(prediction_data$Chl, predicted)
  
  model_summary <- summary(model)
  
  deviance_explained <- model_summary$dev.expl
  degrees_of_freedom <- sum(model_summary$edf)
  ML <- model_summary[["sp.criterion"]][["ML"]]
  
  return(list(
    aic = AIC(model), 
    rmse = rmse, 
    dev_expl = deviance_explained, 
    edf = degrees_of_freedom, 
    ML = ML,
    model = model
  ))
}

This does not work unless a vector called "weights_vector" is already created and present in the environment(and therefore functiion ignores "weights1".

res <- check_mod(ind_formula, distributions[[dist_name]], CHL_df,  weights_vector  =  weights1,  validation_data = chl_validation)
Error in eval(extras, data, env) : object 'weights_vector' not found

The problem is that the weights vector weights_vector needs to be explicitly defined in the environment, and if I pass it under a different name, the function doesn't recognize it properly. Is there a way to ensure that the weights vector is correctly passed as an argument to the gam function without relying on its name in the environment?

I tried both creating the variable within the dataframe and outside as a vector:

# Load required package
library(mgcv)

# Create a simple dataset
set.seed(123)
data_example <- data.frame(
  x = rnorm(100),
  y = rnorm(100),
  Chl = rnorm(100)
)

# Create a weights vector separately
weights1 <- runif(100, 0.5, 1.5)

# Add weights directly into the dataframe
data_example$weights_col <- weights1

# Define the custom function
check_mod <- function(formula, dist, data, weights_vector = NULL, validation_data = NULL) {
  
  if (!is.null(weights_vector)) {
    model <- gam(formula, family = dist, data = data, method = "ML", weights = weights_vector)
  } else {
    model <- gam(formula, family = dist, data = data, method = "ML")
  }
  
  prediction_data <- if (!is.null(validation_data)) validation_data else data
  predicted <- predict(model, newdata = prediction_data)
  
  # Calculate Root Mean Squared Error (RMSE)
  rmse <- sqrt(mean((prediction_data$Chl - predicted)^2))
  
  model_summary <- summary(model)
  
  deviance_explained <- model_summary$dev.expl
  degrees_of_freedom <- sum(model_summary$edf)
  ML <- model_summary[["sp.criterion"]][["ML"]]
  
  return(list(
    aic = AIC(model), 
    rmse = rmse, 
    dev_expl = deviance_explained, 
    edf = degrees_of_freedom, 
    ML = ML,
    model = model
  ))
}

# Define the formula and distribution for the GAM model
formula_example <- Chl ~ s(x) + s(y)
distribution_example <- gaussian()

# Attempt to run the function with the weights vector
# This approach does not work as expected
res1 <- check_mod(formula_example, distribution_example, data_example, weights_vector = weights1)

# Attempt to run the function using the weights column within the dataframe
# This approach also does not work as expected
res2 <- check_mod(formula_example, distribution_example, data_example, weights_vector = data_example$weights_col)

# Print the results
print(res1)
print(res2)

Upvotes: 2

Views: 45

Answers (0)

Related Questions