Reputation: 1815
I am doing cross validation on the dataset for some set of hyperparameters.
lr = LogisticRegression()
paramGrid = ParamGridBuilder() \
.addGrid(lr.regParam, [0, 0.01, 0.05, 0.1, 0.5, 1]) \
.addGrid(lr.elasticNetParam, [0.0, 0.1, 0.5, 0.8, 1]) \
.build()
evaluator = BinaryClassificationEvaluator()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator)
I want to know the best value for regParam and elasticNetParam. In python we have an option to get the best parameters after cross-validation. Is there any method in pyspark to get the best values for parameters after cross-validation?
For example : regParam - 0.05
elasticNetParam - 0.1
Upvotes: 2
Views: 5202
Reputation: 860
Let's say you've built a logistic regression model using the below arguments.
lr = LogisticRegression()
paramGrid = ParamGridBuilder() \
.addGrid(lr.regParam, [0, 0.01, 0.05, 0.1, 0.5, 1]) \
.addGrid(lr.elasticNetParam, [0.0, 0.1, 0.5, 0.8, 1]) \
.build()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator)
cv_model = cv.fit(train_data)
You can extract the best model parameters using the following code:
print(cv_model.getEstimatorParamMaps()[np.argmax(cv_model.avgMetrics)])
Upvotes: 2
Reputation: 2372
Well, you have to fit your CrossValidator first:
cv_model = cv.fit(train_data)
After you do that, you will have a best_model in:
best_model = cv_model.bestModel
To extract the parameters, you will have to do this ugly thing:
best_reg_param = best_model._java_obj.getRegParam()
best_elasticnet_param = best_model._java_obj.getElasticNetParam()
Upvotes: 3