Reputation: 1
I am trying to use the Genetic Algorithm of the caret
R package to do feature selection and use at the end a random forest (ranger) to do the predictions. Instead of using accuracy as a metric I would like to use the F2 score, but I seem not to be able to find the right place to switch out accuracy for the F2 score.
I tried the following:
F_score <- function (data, lev = NULL, model = NULL) {
precision <- posPredValue(data$pred, data$obs, positive = pos_class)
recall <- sensitivity(data$pred, data$obs, positive = pos_class)
F2_score <- (5 * precision * recall) / ((4*precision) + recall)
names(F2_score) <- "F_score"
return(F2_score)
}
control <- gafsControl( functions = rfGA,
method = "repeatedcv",
repeats = 5,
number = 10,
metric = c(internal="F_score", external="F_score"),
allowParallel = TRUE,
returnResamp="final",
verbose = TRUE)
trainctrl <- trainControl( classProbs=TRUE, summaryFunction = F_score)
ga<- gafs( x = cv_dataset[,feature_sets[[f]]],
y = cv_dataset[,"class2predict"],
trControl = trainctrl,
iters = 2, # testing phase, should be increased
gafsControl = control)
However, it says:
Error in { : task 1 failed - "subscript out of bounds"
In addition: Warning message:
In gafs.default(x = cv_dataset[, feature_sets[[f]]], y = cv_dataset[, :
The metric 'F_score' is not created by the summary function; 'Accuracy' will be used instead
I tried more variations of this code, but none of them worked and this seems to be the most logical one, at least to me. In addition to this I would like to use a random forest as a model to predict new data, using the features selected by the genetic algorithm.
Upvotes: 0
Views: 35