rayqz
rayqz

Reputation: 259

Multiple Evaluators in CrossValidator - Spark ML

Is it possible to have more than 1 evaluator in a CrossValidator to get R2 and RMSE at the same time?

Instead of having two different CrossValidator:

    val lr_evaluator_rmse = new RegressionEvaluator()
                           .setLabelCol("ArrDelay")
                           .setPredictionCol("predictionLR")
                           .setMetricName("rmse")
    
    val lr_evaluator_r2 = new RegressionEvaluator()
                         .setLabelCol("ArrDelay")
                         .setPredictionCol("predictionLR")
                         .setMetricName("r2")
    
    val lr_cv_rmse = new CrossValidator()
                      .setEstimator(lr_pipeline)
                      .setEvaluator(lr_evaluator_rmse)
                      .setEstimatorParamMaps(lr_paramGrid)
                      .setNumFolds(3)
                      .setParallelism(3)
    
    val lr_cv_r2 = new CrossValidator()
                  .setEstimator(lr_pipeline)
                  .setEvaluator(lr_evaluator_rmse)
                  .setEstimatorParamMaps(lr_paramGrid)
                  .setNumFolds(3)
                  .setParallelism(3)

Something like this:

val lr_cv = new CrossValidator()
        .setEstimator(lr_pipeline)
        .setEvaluator(lr_evaluator_rmse)
        .setEvaluator(lr_evaluator_r2)
        .setEstimatorParamMaps(lr_paramGrid)
        .setNumFolds(3)
        .setParallelism(3)

Thanks in advance

Upvotes: 1

Views: 369

Answers (1)

biscuits-and-jamie
biscuits-and-jamie

Reputation: 101

The PySpark documentation on CrossValidator indicates that the evaluator argument is a single entity --> evaluator: Optional[pyspark.ml.evaluation.Evaluator] = None

The solution I went with was to create separate pipelines for each evaluator. For example,

from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
from pyspark.ml.classification import RandomForestClassifier
from pyspark.ml.evaluation import MulticlassClassificationEvaluator, BinaryClassificationEvaluator
from pyspark.ml.tuning import CrossValidator

# Convert inputs to vector assembler
vec_assembler = VectorAssembler(inputCols=[inputs], outputCol="features")

# Create Random Forest Classifier pipeline
rf = RandomForestClassifier(labelCol="label", seed=42)
multiclass_evaluator = MulticlassClassificationEvaluator(predictionCol="prediction", labelCol="label", metricName="accuracy")
binary_evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="label")

# Plop model objects into cross validator
cv1 = CrossValidator(estimator=rf, evaluator=multiclass_evaluator, numFolds=3, parallelism=4, seed=42)

cv2 = CrossValidator(estimator=rf, evaluator=binary_evaluator, numFolds=3, parallelism=4, seed=42)

# Put all step in a pipeline
pipeline1 = Pipeline(stages=[vec_assembler, cv1])
pipeline2 = Pipeline(stages=[vec_assembler, cv2])

Upvotes: 0

Related Questions