Reputation: 91
I am trying to perform a 5-fold cross-validation for a model I estimated with the gamlss package. When I use the same code and estimate another model (e.g. OLS), I have no problems. However, when I change the model to a gamlss, I get an error message.
Here is an illustative example:
# load packages and data
library(caret)
library(gamlss)
data(usair)
# create 5 folds
folds <- createFolds(usair$y, k = 5)
When I run this code, everything works fine and I get a list containing my performance measures for each fold:
### 1) OLS
# estimate model 5 times and get performance measures
res1 <- lapply(folds, function(x) {
# Create training and test data set
trainset <- usair[-x, ]
testset <- usair[x, ]
# estimate the model with the training data set
m1<- lm(y~ x1 + x2 + x3 + x4 + x5 + x6,
data=trainset)
# predict outcomes with the test data set
y_pred <- predict(m1, newdata = testset)
# store the actual outcome values in a vector
y_true <- testset$y
# Store performance measures
MAE <- sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error
MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
R2 <- 1-MSE/var(y_true)
list(MAE=MAE,
MSE=MSE,
MAPE=MAPE,
R2= R2)
})
However, when I run this code, changing the model type to a gamlss, I get an error message:
### 2) gamlss
# estimate model 5 times and get performance measures
res2 <- lapply(folds, function(x) {
# Create training and test data set
trainset <- usair[-x, ]
testset <- usair[x, ]
# estimate the model with the training data set
m1<- gamlss(y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
data=trainset)
# predict outcomes with the test data set
y_pred <- predict(m1, newdata = testset)
# store the actual outcome values in a vector
y_true <- testset$y
# Store performance measures
MAE <- sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error
MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
R2 <- 1-MSE/var(y_true)
list(MAE=MAE,
MSE=MSE,
MAPE=MAPE,
R2= R2)
})
The error message is: "Error in eval(substitute(Data)) : object 'trainset' not found". I've run the code in the function separately for each fold and it works. It seems like the train- and test set can suddenly not be created anymore. However, all I did was changing the model.
Does anyone have an idea what could be the problem here?
Upvotes: 0
Views: 383
Reputation: 549
You need to specify the formula parameter like this:
res2 <- lapply(folds, function(x) {
# Create training and test data set
trainset <- usair[-x, ]
testset <- usair[x, ]
# estimate the model with the training data set
m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
data=trainset)
# predict outcomes with the test data set
y_pred <- predict(m1, newdata = testset)
# store the actual outcome values in a vector
y_true <- testset$y
# Store performance measures
MAE <- sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error
MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
R2 <- 1-MSE/var(y_true)
list(MAE=MAE,
MSE=MSE,
MAPE=MAPE,
R2= R2)
})
Results:
> # estimate model 10 times and get performance measures
> res2 <- lapply(folds, function(x) {
+ # Create training and test data set
+ trainset <- usair[-x, ]
+ testset <- usair[x, ]
+ # estimate the model with the training data set
+ m1<- gamlss(formula=y~ri(x.vars=c("x1","x2","x3","x4","x5","x6"), Lp =1),
+ data=trainset)
+ # predict outcomes with the test data set
+ y_pred <- predict(m1, newdata = testset)
+ # store the actual outcome values in a vector
+ y_true <- testset$y
+ # Store performance measures
+ MAE <- sum(abs(y_true-y_pred))/length(y_true) # Mean Absolute Error
+ MSE <- sum((y_true-y_pred)^2)/length(y_true) # Mean Squared Error
+ MAPE <- 100*sum(abs(y_true-y_pred)/y_true)/length(y_true) # Mean Absolute Percentage Error
+ R2 <- 1-MSE/var(y_true)
+ list(MAE=MAE,
+ MSE=MSE,
+ MAPE=MAPE,
+ R2= R2)
+ })
GAMLSS-RS iteration 1: Global Deviance = 281.937
GAMLSS-RS iteration 2: Global Deviance = 281.9348
GAMLSS-RS iteration 3: Global Deviance = 281.9348
Upvotes: 1