Reputation: 3015

random forest with specified false positive and sensitivity

Using the randomForest package in R, I was able to train a random forest that minimized overall error rate. However, what I want to do is train two random forests, one that first minimizes false positive rate (~ 0) and then overall error rate, and one that first maximizes sensitivity (~1), and then overall error. Another construction of the problem would be: given a false positive rate and sensitivity rate, train two different random forests that satisfy one of the rates respectively, and then minimize overall error rate. Does anyone know if theres an r package or python package, or any other software out there that does this and or how to do this? Thanks for the help.

Upvotes: 2

Answers (3)

Maggie

Reputation: 125

I believe random forests produce a proportion for each observation which represents the number of votes cast by the forest for each class. By default, class is assigned based on plurality vote. If you'd like to bias your model to reduce false positives or false negatives specifically, you can adjust the threshold for predicting each class. In randomForest in R, use the cutoff argument.

I found this post helpful:

https://stats.stackexchange.com/questions/112388/how-to-change-threshold-for-classification-in-r-randomforests

Upvotes: 0

fatih

Reputation: 1395

You can do a grid serarch over the 'regularazation' parameters to best match your target behavior.

Parameters of interest:

max depth
number of features

Upvotes: 0

Ping Jin

Reputation: 520

This is a workaround that may be worth trying. (Sorry that I do not have enough reputation to put it as a comment.)

sensitivity = TP/(TP + FN)
specificity = TN/(TN + FP)
ER = (TP + TN)/(TP + FN + TN + FP)

(Notations from Sensitivity_and_specificity)

If you duplicate some positive/negative samples (or increase the weights), the ER will approximate sensitivity/specificity.

So if you want to maximize sensitivity, then you can sample/duplicate some positive samples into the dataset then train your RF on it. For maximizing specificity, you can do the same thing on negative samples.

Upvotes: 0

random forest with specified false positive and sensitivity

Answers (3)

Related Questions