Miranda
Miranda

Reputation: 1

How can I use, in R with the caret package, the F2 score as a metric for the feature selection using Genetic Algorithm and Random Forest as a model?

I am trying to use the Genetic Algorithm of the caret R package to do feature selection and use at the end a random forest (ranger) to do the predictions. Instead of using accuracy as a metric I would like to use the F2 score, but I seem not to be able to find the right place to switch out accuracy for the F2 score.

I tried the following:

F_score <- function (data, lev = NULL, model = NULL) {
        precision <- posPredValue(data$pred, data$obs, positive = pos_class)
        recall  <- sensitivity(data$pred, data$obs, positive = pos_class)
        F2_score <- (5 * precision * recall) / ((4*precision) + recall)
        names(F2_score) <- "F_score"
        return(F2_score)
      }

control <- gafsControl( functions = rfGA,
                            method = "repeatedcv",
                            repeats = 5, 
                            number = 10,
                            metric = c(internal="F_score", external="F_score"),
                            allowParallel = TRUE,
                            returnResamp="final", 
                            verbose = TRUE)

trainctrl <- trainControl( classProbs=TRUE,  summaryFunction = F_score)
      
ga<-  gafs( x = cv_dataset[,feature_sets[[f]]], 
                               y = cv_dataset[,"class2predict"],
                               trControl = trainctrl,
                               iters = 2, # testing phase, should be increased
                               gafsControl = control)

However, it says:

Error in { : task 1 failed - "subscript out of bounds"
In addition: Warning message:
In gafs.default(x = cv_dataset[, feature_sets[[f]]], y = cv_dataset[,  :
  The metric 'F_score' is not created by the summary function; 'Accuracy' will be used instead

I tried more variations of this code, but none of them worked and this seems to be the most logical one, at least to me. In addition to this I would like to use a random forest as a model to predict new data, using the features selected by the genetic algorithm.

Upvotes: 0

Views: 35

Answers (0)

Related Questions