Reputation: 33
I am using the xgboost PySpark API
. This API is experimental but it supports most of the features of the xgboost API.
As per the documentation below, eval_set
parameter is not supported and instead, validationIndicatorCol
parameter should be used.
https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.spark
https://databricks.github.io/spark-deep-learning/#module-sparkdl.xgboost
xgb = XgboostClassifier(featuresCol = "features",
labelCol="label",
num_workers = 40,
random_state = 1,
missing = None,
objective = 'binary:logistic',
validationIndicatorCol = 'isVal',
eval_metric = 'aucpr' ,
n_estimators = best_n_estimators,
max_depth = best_max_depth,
learning_rate = best_learning_rate
)
pipeline = Pipeline(stages=[vectorAssembler,xgb])
pipelineModel = pipeline.fit(sampled_df)
It seems to be running without any errors which is great.
How do you print and look at the evaluation results? Traditional xgboost has evals_result()
method which pipelineModel.stages[-1].evals_result()
doesn't seem to work in the PySpark API. This method should normally work since the PySpark API
documentation doesn't say otherwise. Any idea on how to make it work?
Upvotes: 0
Views: 878
Reputation: 1
Assuming you need to see the parameters at the best iteration, this worked for me:
xgb_model = model.stages[-1]
xgb_model.get_booster().attributes() #this returns all the parameters at the best iteration
Upvotes: 0