Regressor
Regressor

Reputation: 1973

Evaluation metrics on Spark ML multiclass classification problem

I am looking for a Multiclass classification example using Spark-Scala but I am unable to find one yet. Specifically speaking, I want to train a classification model and see all the associated metrics on training and test data.

Does Spark ML (DataFrame based API) support confusion matrix on multi-class problems?

I am looking for Spark v 2.2 and above examples. An end-to-end example would be really useful. I can't find confusion matrix evaluation here -

https://spark.apache.org/docs/2.3.0/ml-classification-regression.html

Upvotes: 2

Views: 3921

Answers (2)

Uri Goren
Uri Goren

Reputation: 13690

Assuming that model is your trained model, and test is the test-set, this is the code snippet for calculating the confusion-matrix in python:

import pandas as pd
from pyspark.mllib.evaluation import MulticlassMetrics
predictionAndLabels = model.transform(test).select('label', 'prediction')
metrics = MulticlassMetrics(predictionAndLabels.rdd.map(lambda x: tuple(map(float, x))))

confusion_matrix = metrics.confusionMatrix().toArray()
labels = [int(l) for l in metrics.call('labels')]
confusion_matrix = pd.DataFrame(confusion_matrix , index=labels, columns=labels)

Note that the metrics.labels is not implemented in pyspark for some reason, so we're calling the scala backend directly

Upvotes: 1

Matko Soric
Matko Soric

Reputation: 117

this should be it:

val metrics = new MulticlassMetrics(predictionAndLabels)
println(metrics.confusionMatrix)

classification metrics are here: https://spark.apache.org/docs/2.3.0/mllib-evaluation-metrics.html

Upvotes: 1

Related Questions