Reputation: 1973
I am looking for a Multiclass classification example using Spark-Scala but I am unable to find one yet. Specifically speaking, I want to train a classification model and see all the associated metrics on training and test data.
Does Spark ML (DataFrame based API) support confusion matrix on multi-class problems?
I am looking for Spark v 2.2 and above examples. An end-to-end example would be really useful. I can't find confusion matrix evaluation here -
https://spark.apache.org/docs/2.3.0/ml-classification-regression.html
Upvotes: 2
Views: 3921
Reputation: 13690
Assuming that model
is your trained model, and test
is the test-set,
this is the code snippet for calculating the confusion-matrix in python
:
import pandas as pd
from pyspark.mllib.evaluation import MulticlassMetrics
predictionAndLabels = model.transform(test).select('label', 'prediction')
metrics = MulticlassMetrics(predictionAndLabels.rdd.map(lambda x: tuple(map(float, x))))
confusion_matrix = metrics.confusionMatrix().toArray()
labels = [int(l) for l in metrics.call('labels')]
confusion_matrix = pd.DataFrame(confusion_matrix , index=labels, columns=labels)
Note that the metrics.labels
is not implemented in pyspark
for some reason, so we're calling the scala
backend directly
Upvotes: 1
Reputation: 117
this should be it:
val metrics = new MulticlassMetrics(predictionAndLabels)
println(metrics.confusionMatrix)
classification metrics are here: https://spark.apache.org/docs/2.3.0/mllib-evaluation-metrics.html
Upvotes: 1