Fluxy
Fluxy

Reputation: 2978

Error in { : task 1 failed - "undefined columns selected"

I try to train the random forest as follows:

library(caret)
library(randomForest)

nfields <- 5
control <- rfeControl(functions = rfFuncs,
                      method = "repeatedcv",
                      repeats = 1,
                      verbose = TRUE)

fields <- colnames(dtrain)[!colnames(dtrain) %in% "my_target"]
predictors_rfe <- rfe(dtrain[,fields,with=F], dtrain$my_target,
                       rfeControl = control)

Random forests's execution:

+(rfe) fit Fold01.Rep1 size: 120 
-(rfe) fit Fold01.Rep1 size: 120 
+(rfe) imp Fold01.Rep1 
-(rfe) imp Fold01.Rep1 
+(rfe) fit Fold01.Rep1 size:  16 
+(rfe) fit Fold02.Rep1 size: 120 
-(rfe) fit Fold02.Rep1 size: 120 
+(rfe) imp Fold02.Rep1 
-(rfe) imp Fold02.Rep1 
+(rfe) fit Fold02.Rep1 size:  16 
-(rfe) fit Fold02.Rep1 size:  16 
+(rfe) fit Fold02.Rep1 size:   8 
-(rfe) fit Fold02.Rep1 size:   8 
+(rfe) fit Fold02.Rep1 size:   4 
-(rfe) fit Fold02.Rep1 size:   4 
+(rfe) fit Fold03.Rep1 size: 120 
-(rfe) fit Fold03.Rep1 size: 120 
+(rfe) imp Fold03.Rep1 
# ...
+(rfe) fit Fold10.Rep1 size:  16 
-(rfe) fit Fold10.Rep1 size:  16 
+(rfe) fit Fold10.Rep1 size:   8 
-(rfe) fit Fold10.Rep1 size:   8 
+(rfe) fit Fold10.Rep1 size:   4 
-(rfe) fit Fold10.Rep1 size:   4 

Then I get the error:

Error in { : task 1 failed - "undefined columns selected"

From the error message I cannot understand what is wrong… Could anybody help please?

I found out from here that it's a bug of caret. But this bug was reported and solved in 2016... I use the latest version of caret

Upvotes: 1

Views: 1083

Answers (1)

bbiasi
bbiasi

Reputation: 1599

I made an example using iris and following the caret tutorial. Probably your error is in:

dtrain [, fields, with = F]

See the example below using iris:

set.seed(1)
library(caret)

nfields <- 5
control <- rfeControl(functions = rfFuncs,
                      method = "repeatedcv",
                      repeats = 1,
                      verbose = F)
irisx <- iris[,1:4]
fields <- colnames(irisx)[!colnames(irisx) %in% "Petal.Width"]

predictors_rfe <- rfe(irisx[,fields], 
                      irisx$Petal.Width,
                      rfeControl = control)

predictors_rfe
> predictors_rfe

Recursive feature selection

Outer resampling method: Cross-Validated (10 fold, repeated 1 times) 

Resampling performance over subset size:

 Variables  RMSE Rsquared    MAE  RMSESD RsquaredSD   MAESD Selected
         3 0.196   0.9418 0.1519 0.03502     0.0177 0.02608        *

The top 3 variables (out of 3):
   Petal.Length, Sepal.Length, Sepal.Width

If you can provide a reproducible example with your dataset, I will be able to better check the possible error.

Upvotes: 2

Related Questions