Binary Classification Evaluator AUC Score in Pyspark

Question

I have a dataset with 2 classes (churners and non-churners) in the ratio 1:4. I used Random Forests algorithm via Spark MLlib. My model is terrible at predicting churn class and does nothing. I use BinaryClassificationEvaluator to evaluate my model in Pyspark. The default metric for the BinaryClassificationEvaluator is AreaUnderRoc.

My code

from pyspark.ml.classification import RandomForestClassifier
evaluator = BinaryClassificationEvaluator()

# Create an initial RandomForest model.
rf = RandomForestClassifier(labelCol="label", featuresCol="indexedFeatures", numTrees=1000,impurity="entropy")
# Train model with Training Data
rfModel = rf.fit(train_df)
rfModel.featureImportances

# Make predictions on test data using the Transformer.transform() method.
predictions = rfModel.transform(test_df)

# AUC Evaluate best model
evaluator.evaluate(predictions)
print('Test Area Under Roc',evaluator.evaluate(predictions))

Test Area Under Roc 0.8672196520652589

and here is the confusion matrix.

confusion matrix

Since TP=0, how could be that score possible? Could this value be wrong?

I have other models which works fine,but this score makes me wonder if the others are wrong as well.

Binary Classification Evaluator AUC Score in Pyspark

Answers (1)

Related Questions