user3803714
user3803714

Reputation: 5389

MLlib: Calculating Precision and Recall for multiple threshold values

I set the setting the threshold value of my logistic regression to 0.5 before I use it for scoring. I now want to get precision, recall, f1 score for that value. Unfortunately, when I try doing that the only threshold values that I see are 1.0 and 0.0. How do I get metrics for threshold values other than 0 and 1.

For example here is the o/p:

Threshold is: 1.0, Precision is: 0.85

Threshold is: 0.0, Precision is: 0.312641

I don't get Precision for Threshold 0.5. Here is the relevant code.

// I am setting the threshold value of my Logistic regression model here.

model.setThreshold(0.5)

// Compute the score and generate an RDD with prediction and label values.  
val predictionAndLabels = data.map { 
  case LabeledPoint(label, features) => (model.predict(features), label)
}

// I now want to compute the precision and recall and other metrics. Since I have set the model threshold to 0.5, I want to get PR at that value.

val metrics = new BinaryClassificationMetrics(predictionAndLabels)
val precision = metrics.precisionByThreshold()

precision.foreach { 
  case (t, p) => {
    println(s"Threshold is: $t, Precision is: $p")

    if (t == 0.5) {
      println(s"Desired: Threshold is: $t, Precision is: $p")        
    }
}

Upvotes: 2

Views: 1531

Answers (1)

yoh.lej
yoh.lej

Reputation: 1104

The precisionByThreshold() method is actually trying different thresholds and giving the corresponding precision values. Since you already thresholded your data, you only have 0s and 1s.

Let's say you have: [0 0 0 1 1 1] after thresholding and the real labels are [f f f f t t].

Then thresholding with 0 you have [t t t t t t] which gives you 4 false positive and 2 true positive hence a precision of 2 / (2 + 4) = 1/3

Now thresholding with 1 you have [f f f t t t] which and gives you 1 false positive and 2 true positive hence a precision of 2 /(2 + 1) = 2/3

You can see that using a threshold of .5 now would give you [f f f t t t], the same as thresholding with 1, so it is the precision for threshold 1 that you are looking for.

This is a bit confusing because you already thresholded your predictions. If you do not threshold your predictions, and let's say you had [.3 .4 .4 .6 .8 .9] (to stay consistent with the [0 0 0 1 1 1] I have been using).

Then precisionByThreshold() would give you precisions values for threshold 0, .3, .4, .6 .8 .9, because these are all the threshold giving different results and thus different precisions, and to get the value for threshold .5 you would still take the value for next larger threshold (.6) because again, it would give the same predictions hence the same precision.

Upvotes: 1

Related Questions