Xgboost on Spark Validation Indicator Column and Evaluation Metric

Question

I am using the xgboost PySpark API. This API is experimental but it supports most of the features of the xgboost API.

As per the documentation below, eval_set parameter is not supported and instead, validationIndicatorCol parameter should be used.

https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.spark

https://databricks.github.io/spark-deep-learning/#module-sparkdl.xgboost

xgb = XgboostClassifier(featuresCol = "features", 
                        labelCol="label", 
                        num_workers = 40, 
                        random_state = 1,
                        missing = None, 
                        objective = 'binary:logistic',
                        validationIndicatorCol = 'isVal',
                        eval_metric = 'aucpr' ,
                        n_estimators = best_n_estimators, 
                        max_depth = best_max_depth, 
                        learning_rate = best_learning_rate       
                       )

 pipeline = Pipeline(stages=[vectorAssembler,xgb])
 pipelineModel = pipeline.fit(sampled_df)

It seems to be running without any errors which is great.

How do you print and look at the evaluation results? Traditional xgboost has evals_result() method which pipelineModel.stages[-1].evals_result() doesn't seem to work in the PySpark API. This method should normally work since the PySpark API documentation doesn't say otherwise. Any idea on how to make it work?

Xgboost on Spark Validation Indicator Column and Evaluation Metric

Answers (1)

Related Questions