manjunatha
manjunatha

Reputation: 1

Experimental comparison and variants

Could anybody explain this code from Luis Torgo (the DMwR package):

cv.rpart <- function(form, train, test, ...) {
  m   <- rpartXse(form, train, ...)
  p   <- predict(m, test)
  mse <- mean( (p-resp(form,test))^2 )
  c(  nmse=mse/mean( (mean(resp(form,train))-resp(form,test))^2 )  )
}
cv.lm <- function(form, train, test, ...) {
  m   <- lm(form, train,...)
  p   <- predict(m, test)
  p   <- ifelse(p<0, 0, p)
  mse <- mean( (p-resp(form,test))^2 )
  c(  nmse=mse/mean( (mean(resp(form,train))-resp(form,test))^2 )  )
}

res <- experimentalComparison(c(dataset(a1 ~ .,clean.algae[,1:12],'a1')),
                              c(variants('cv.lm'), variants('cv.rpart',se=c(0,0.5,1))),
                              cvSettings(3,10,1234)
                              )

How will experimentalComparison use cv.rpart and cv.lm?

Upvotes: 0

Views: 220

Answers (1)

matsuo_basho
matsuo_basho

Reputation: 3020

cv.lm and cv.rpart perform cross-validation on the linear model and the decision trees models, respectively. For the decision trees, in experimentalComparison, we also specify different complexity parameters.

If you run plot(res) at the end, as Torgo has it in his code, you can see the boxplots of the errors for the 4 models (1 lm + 3 rpart).

I've commented the lines below.

# this function combines training, cross-validation, pruning, prediction,
# and metric calculation
cv.rpart <- function(form, train, test, ...) {
  # rpartXse grows a tree and calculates the cross-validation error
  # at each node.  It then determines the best tree based on the 
  # the results of this cross-validation.
  # Torgo details how the optimal tree based on 
  # cross-validation results is chosen
  # earlier in his code
  m   <- rpartXse(form, train, ...)   
  # use m to predict on test set
  p   <- predict(m, test)
  # calculates normalized mean square error
  # Refer https://rem.jrc.ec.europa.eu/RemWeb/atmes2/20b.htm
  # for details on NMSE
  mse <- mean( (p-resp(form,test))^2 )
  c(  nmse=mse/mean( (mean(resp(form,train))-resp(form,test))^2 )  )
}


cv.lm <- function(form, train, test, ...) {
  m   <- lm(form, train,...)
  p   <- predict(m, test)
  p   <- ifelse(p<0, 0, p)
  mse <- mean( (p-resp(form,test))^2 )
  c(  nmse=mse/mean( (mean(resp(form,train))-resp(form,test))^2 )  )
}

# experimental comparison is designed to create numerous models 
# based on parameters you provide it

# Arguments of experimentalComparison function are
# Dataset class object
# learner class object (contains the learning systems that will be used)
# settings class object
# These datatypes are unique to the DMwR package
# dataset is a function that creates a dataset object (a list)
# each element of the list contains the response variable
# and the actual data
res <- experimentalComparison(
             c(dataset(a1 ~ .,clean.algae[,1:12],'a1')),
             c(variants('cv.lm'), 
              # se specifies the number of standard errors to
              # use in the post-pruning of the tree
              variants('cv.rpart',se=c(0,0.5,1))),
             # cvSettings specifies 3 repetitions of 10-fold
             # cross-validation
             # with a seed of 1234
             cvSettings(3,10,1234)
                              )

summary(res) gives you basic statistics for the cross-validation results of each model.

Upvotes: 1

Related Questions