Reputation: 4926
I have a training_predictors
set with 56 columns, all of which are numeric
. training_labels
is a factor
vector of 0
and 1
.
I am using following list as subset sizes to be tested.
subset_sizes <- c(1:5, 10, 15, 20, 25)
Following is the list of modified rfFuncs
functions.
rfRFE <- list(summary = defaultSummary,
fit = function(x, y, first, last, ...) {
library(randomForest)
randomForest(x, y, importance = first, ...)
},
pred = function(object, x) predict(object, x),
rank = function(object, x, y) {
vimp <- varImp(object)
vimp <- vimp[order(vimp$Overall, decreasing = TRUE),,drop = FALSE]
vimp$var <- rownames(vimp)
vimp
},
selectSize = pickSizeBest,
selectVar = pickVars)
I have declared the control function as:
rfeCtrl <- rfeControl(functions = rfRFE,
method = "cv",
number = 10,
verbose = TRUE)
But when I run rfe
function as shown below,
rfProfile <- rfe(training_predictors,
training_labels,
sizes = subset_sizes,
rfeControl = rfeCtrl)
I am getting an error as :
Error in { : task 1 failed - "argument 1 is not a vector"
I also tried changing the vector subset_sizes
, but still no luck. What am I doing wrong?
Update : I tried to run these steps one by one and the problem seems to be with the rank
function. But I am still unable to figure out the problem.
Update: I found out the problem. varImp
in rank function is not containing $Overall
. But it contains columns with names 0
and 1
. Why is it so? What does 0
and 1
signify (both column values are exactly same, by the way)? Also, how can I make varImp
to return $Overall
column? [as a temporary solution, I am creating a new column $Overall
and attaching it to vimp
in rank
function.]
Upvotes: 5
Views: 2828
Reputation: 1
I have found a solution for this same issue to fit a logistic regression model in rfe using caret. The solution as below:
glmFuncs$rank <-function (object, x, y){
vimp <- varImp(object, scale = FALSE)
loadNamespace("dplyr")
vimp <- vimp$importance %>%
mutate(var=row.names(.)) %>%
arrange(-Overall)
vimp <- vimp[order(vimp$Overall, decreasing = TRUE), ,drop = FALSE]
vimp
}
Upvotes: 0
Reputation: 14316
Using 0
and 1
as factor levels is problematic since those are not valid R column names. In your other SO post you probably would have received a message about using these as factor levels for your output.
Try using a factor outcome with some more informative levels that can be translated into valid R column names (for class probabilities).
Upvotes: 4