Nerd
Nerd

Reputation: 91

error when trying cross-validation for a gamlss model

I am trying to perform a 5-fold cross-validation for a model I estimated with the gamlss package. When I use the same code and estimate another model (e.g. OLS), I have no problems. However, when I change the model to a gamlss, I get an error message.

Here is an illustative example:

# load packages and data
library(caret)
library(gamlss)
data(usair)

# create 5 folds
folds <- createFolds(usair$y, k = 5)

When I run this code, everything works fine and I get a list containing my performance measures for each fold:

### 1) OLS
# estimate model 5 times and get performance measures
res1 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
  m1<- lm(y~ x1 + x2 + x3 + x4 + x5 + x6,
              data=trainset)
  # predict outcomes with the test data set
  y_pred <- predict(m1, newdata = testset)
  # store the actual outcome values in a vector
  y_true <- testset$y
  # Store performance measures
  MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
  MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
  MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
  R2 <- 1-MSE/var(y_true)
  list(MAE=MAE,
       MSE=MSE,
       MAPE=MAPE,
       R2= R2)
})

However, when I run this code, changing the model type to a gamlss, I get an error message:

### 2) gamlss
# estimate model 5 times and get performance measures
res2 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
m1<- gamlss(y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
            data=trainset)
# predict outcomes with the test data set
y_pred <- predict(m1, newdata = testset)
# store the actual outcome values in a vector
y_true <- testset$y
# Store performance measures
MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
R2 <- 1-MSE/var(y_true)
list(MAE=MAE,
     MSE=MSE,
     MAPE=MAPE,
     R2= R2)
})

The error message is: "Error in eval(substitute(Data)) : object 'trainset' not found". I've run the code in the function separately for each fold and it works. It seems like the train- and test set can suddenly not be created anymore. However, all I did was changing the model.

Does anyone have an idea what could be the problem here?

Upvotes: 0

Views: 383

Answers (1)

AugtPelle
AugtPelle

Reputation: 549

You need to specify the formula parameter like this:

res2 <- lapply(folds, function(x) {
  # Create training and test data set
  trainset <- usair[-x, ]
  testset <- usair[x, ]
  # estimate the model with the training data set
  m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
              data=trainset)
  # predict outcomes with the test data set
  y_pred <- predict(m1, newdata = testset)
  # store the actual outcome values in a vector
  y_true <- testset$y
  # Store performance measures
  MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
  MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
  MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
  R2 <- 1-MSE/var(y_true)
  list(MAE=MAE,
       MSE=MSE,
       MAPE=MAPE,
       R2= R2)
})

Results:

> # estimate model 10 times and get performance measures
> res2 <- lapply(folds, function(x) {
+   # Create training and test data set
+   trainset <- usair[-x, ]
+   testset <- usair[x, ]
+   # estimate the model with the training data set
+   m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
+               data=trainset)
+   # predict outcomes with the test data set
+   y_pred <- predict(m1, newdata = testset)
+   # store the actual outcome values in a vector
+   y_true <- testset$y
+   # Store performance measures
+   MAE <-  sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
+   MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error  
+   MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
+   R2 <- 1-MSE/var(y_true)
+   list(MAE=MAE,
+        MSE=MSE,
+        MAPE=MAPE,
+        R2= R2)
+ })
GAMLSS-RS iteration 1: Global Deviance = 281.937 
GAMLSS-RS iteration 2: Global Deviance = 281.9348 
GAMLSS-RS iteration 3: Global Deviance = 281.9348

Upvotes: 1

Related Questions