Reputation: 1073
We are running a RandomForest model that creates 3 classifiers and we want to calculalte the AUC for use for evaluation of our model aside from using accuracy
Would there is an approach if we are using spark.ml? Currently we call MulticlassClassificationEvaluator and using the metric accuracy. In the listings, it does not have auc as part of it but just the following: metrics:
* param for metric name in evaluation (supports `"f1"` (default), `"weightedPrecision"`,* `"weightedRecall"`, `"accuracy"`)
Was wondering if there are examples on how to compute AUC for spark?
We are running Spark 2.0 and here is the current set-up that we are doing the valuate using the accuracy metric
max_depth = model_params['max_depth']
num_trees = model_params['num_trees']
# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="label", featuresCol="features", impurity = "gini",
featureSubsetStrategy="all", numTrees = num_trees, maxDepth = max_depth)
# Train model. This model fit is used for scoring future packages later.
model_fit = rf.fit(training_data)
# Make predictions.
transformed = model_fit.transform(test_data)
# Calculate and show the confusion matrix on test data if indicated
if model_params['calc_matrix'] is True:
# Select (prediction, true label) and compute test error
evaluator = MulticlassClassificationEvaluator(labelCol="label",
predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(transformed)
print("RF Overall Accuracy = {}, numTrees = {}, maxDepth = {}".
format(accuracy, num_trees, max_depth))
Upvotes: 0
Views: 7284
Reputation: 1398
Area under the curve (AUC) make sense only for binary classifiers, but you are using MulticlassClassificationEvaluator (which implies number of output classes > 2)
check BinaryClassificationEvaluator
If you, however, want to build multiclass classifier, you need multiclass accuracy
Upvotes: 1