Columns not available for when training lasso model using caret

Question

I am getting an odd error

Error in `[.data.frame`(data, , lvls[1]) : undefined columns selected

message when I am using caret to train a glmnet model. I have used basically the same code and the same predictors for an ordinal model (just with a different factor ythen) and it worked fine. It took 400 core hours to compute so I cant show it here though).

#Source a small subset of data
source("https://gist.githubusercontent.com/FredrikKarlssonSpeech/ebd9fccf1de6789a3f529cafc496a90c/raw/efc130e41c7d01d972d1c69e59bf8f5f5fea58fa/voice.R")
trainIndex <- createDataPartition(notna$RC, p = .75, 
                                  list = FALSE, 
                                  times = 1)


training <- notna[ trainIndex[,1],] %>%
  select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)
testing  <- notna[-trainIndex[,1],] %>%
  select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)

fitControl <- trainControl(## 10-fold CV
  method = "CV",
  number = 10,
  allowParallel=TRUE,
  savePredictions="final",
  summaryFunction=twoClassSummary)

vtCVFit <- train(x=training[-1],y=training[,"RC"], 
                  method = "glmnet", 
                  trControl = fitControl,
                  preProcess=c("center", "scale"),
                  metric="Kappa"
)

I cant find anything obviously wrong with the data. No NAs

table(is.na(training))

FALSE 
43166

and dont see why it would try to index outside of the number of columns.

Any suggestions?

Alex Yahiaoui Martinez · Accepted Answer

You have to remove summaryFunction=twoClassSummary in your trainControl(). It works for me.

fitControl <- trainControl(## 10-fold CV
 method = "CV",
 number = 10,
 allowParallel=TRUE,
 savePredictions="final")

vtCVFit <- train(x=training[-1],y=training[,"RC"], 
method = "glmnet", 
 trControl = fitControl,
preProcess=c("center", "scale"),
metric="Kappa")

 print(vtCVFit)

#glmnet 

#113 samples
#381 predictors
#  2 classes: 'NVT', 'VT' 

#Pre-processing: centered (381), scaled (381) 
#Resampling: Bootstrapped (25 reps) 
#Summary of sample sizes: 113, 113, 113, 113, 113, 113, ... 
#Resampling results across tuning parameters:

#  alpha  lambda      Accuracy   Kappa    
#  0.10   0.01113752  0.5778182  0.1428393
#  0.10   0.03521993  0.5778182  0.1428393
#  0.10   0.11137520  0.5778182  0.1428393
#  0.55   0.01113752  0.5778182  0.1428393
#  0.55   0.03521993  0.5748248  0.1407333
#  0.55   0.11137520  0.5749980  0.1136131
#  1.00   0.01113752  0.5815391  0.1531280
#  1.00   0.03521993  0.5800217  0.1361240
#  1.00   0.11137520  0.5939621  0.1158007

#Kappa was used to select the optimal model using the largest value.
#The final values used for the model were alpha = 1 and lambda = 0.01113752.

Columns not available for when training lasso model using caret

Answers (2)

Related Questions