Does Apache Spark MLlib 1.3.1 correctly compute multi-class precision and recall values?

Question

Just tried the MulticlassMetrics feature in Spark MLlib 1.3.1 with a simple generic (label,predicition) input

(label, predicition)
( 1.0 , 1.0)
( 2.0 , 2.0)
( 3.0 , 3.0)
( 4.0 , 3.0)
( 4.0 , 4.0)
( 4.0 , 4.0)

and I get ( Scala code snippet shown )

    labelsAndPredictions.foreach(println)

    val metrics = new MulticlassMetrics(labelsAndPredictions)
    println("confusionMatrix: ")        
    println(metrics.confusionMatrix)

    println("Precision: ")
    metrics.labels.foreach( x => println(x.toInt + " " + metrics.precision(x.toInt)) )

    println("Recall: ")
    metrics.labels.foreach( x => println(x.toInt + " " + metrics.recall(x.toInt)) )

the precision result values

precision:

1   1.0
2   1.0
3   1.0
4   0.6666666666666666

which seems to be at odds with what one would expect:

Precision: Given all the predicted labels (for a given class X), how many instances were correctly predicted? ( see more at: http://www.text-analytics101.com/2014/10/computing-precision-and-recall-for.html#sthash.OTmBn0Vb.dpuf)

So for class label 4 I would expect

prec(4) = 1.0 (2 out of 2 are correct)

and for class label 3 I would expect

prec(3) = 0.5 (1 out of 2 are correct).

If I call MLlib recall() on the same data set, I get the expected (correct) result for precision.

Could it be that precision() and recall() in MLlib are currently incorrectly interchanged?

Any input,comment would be greatly appreciated. Thanks!

Does Apache Spark MLlib 1.3.1 correctly compute multi-class precision and recall values?

Answers (1)

Related Questions