Minor issue related to glmnet and LOOCV in R

Question

I tried to fit a Regularization model (LASSO, Ridge, ElasticNet) using Leave one out cross-validation split with cv.glmnet and glmnet.

I used this DATASET, the dataset has 517 rows and 13 columns (two of them are categorical variables). The dependent variable is "area".

I would like to build a model without the categorical variables. Then is to calculate the coefficients mean of each LOOCV run, as well as the mean of R-squared and Root mean square.

The last 10 rows of the dataset are the following

tail(wdbc,10)
    X Y month day FFMC   DMC    DC  ISI temp RH wind rain  area
508 2 4   aug fri 91.0 166.9 752.6  7.1 25.9 41  3.6  0.0  0.00
509 1 2   aug fri 91.0 166.9 752.6  7.1 25.9 41  3.6  0.0  0.00
510 5 4   aug fri 91.0 166.9 752.6  7.1 21.1 71  7.6  1.4  2.17
511 6 5   aug fri 91.0 166.9 752.6  7.1 18.2 62  5.4  0.0  0.43
512 8 6   aug sun 81.6  56.7 665.6  1.9 27.8 35  2.7  0.0  0.00
513 4 3   aug sun 81.6  56.7 665.6  1.9 27.8 32  2.7  0.0  6.44
514 2 4   aug sun 81.6  56.7 665.6  1.9 21.9 71  5.8  0.0 54.29
515 7 4   aug sun 81.6  56.7 665.6  1.9 21.2 70  6.7  0.0 11.16
516 1 4   aug sat 94.4 146.0 614.7 11.3 25.6 42  4.0  0.0  0.00
517 6 3   nov tue 79.5   3.0 106.7  1.1 11.8 31  4.5  0.0  0.00

My attempted code is as follows

set.seed(123) 
  data <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv") 
  data<-data[-(3:4)] 
  nrFolds <- 517
  folds <- rep_len(1:nrFolds, nrow(data))
  for(k in 1:nrFolds) {
      fold <- which(folds == k)
      data.train <- data[-fold,]
      data.test <- data[fold,]
      x.train <- as.matrix(data.train[-11])
      y.train <- as.matrix(data.train[11])
      x.test <- as.matrix(data.test[-11])
      y.test <- as.matrix(data.test[11])
      cv <- cv.glmnet(x.train, y.train, alpha = 0)
      # cv$lambda.min
      model <- glmnet(x.train, y.train, alpha = 0, lambda = cv$lambda.min)
      coef(model) 
      predictions <- model %>% predict(x.test) %>% as.vector()
      RMSE <- RMSE(predictions, data.test$area)
      Rsquare <- R2(predictions, data.test$area)
      LL <- list(cv,model,coef(model),predictions, RMSE,Rsquare)
  }
  LL

The code gives me only one prediction value! and R-square has NA value!. Also, I am not sure if I am right to use Leave one out cross-validation split with cv.glmnet and glmnet or not.

Any idea to help is appreciated.

Updated

I tried to use the caret package

I first split the dataset using LOOCV because I want every single observation to be in test process. Then I used train function to do my initial idea in the question.

My code

data <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/forest-fires/forestfires.csv") 
data<-data[-(3:4)] 
lambda <- 10^seq(-3, 3, length = 100)
predictions <- rep(0,nrow(data))
set.seed(123)
for(i in 1:nrow(data)){
  test.data<-data[i,]
  train.data<-data[-i,]
ridge <- train(
  area ~., data = train.data, method = "glmnet",
  trControl = trainControl("loocv", number = 517),
  tuneGrid = expand.grid(alpha = 0, lambda = lambda))
  coefs=coef(ridge$finalModel, ridge$bestTune$lambda)
  predictions[i] <- ridge %>% predict(test.data)
  RMSE = RMSE(predictions, test.data$area)
  Rsquare = R2(predictions, test.data$area)
  LL<-list(RMSE,Rsquare,coefs,predictions) 
}

This code also gives me the following error

Error in cor(obs, pred, use = ifelse(na.rm, "complete.obs", "everything")) : 
  incompatible dimensions
In addition: Warning message:
In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo,  :
  There were missing values in resampled performance measures.

More Update

I also wrote the following code using caret package (Ridge for example)

set.seed(123)
data <- read.csv("forestfires.csv") 
data<-data[-(3:4)]  
lambda.grid <-10^seq(10,-2, length =100)
alpha.grid<- 0 #Ridge 
control <- trainControl(method="LOOCV")
srchGrd = expand.grid(alpha = alpha.grid, lambda = lambda.grid)
lm_model <- train(area ~ . , data=data, trControl=control,tuneGrid=srchGrd,method = 'glmnet')
lm_model
coef(lm_model$finalModel, lm_model$bestTune$lambda)

Am I right now?

Minor issue related to glmnet and LOOCV in R

Answers (1)

Related Questions