merkle
merkle

Reputation: 1815

How to get the best hyperparameter value after crossvalidation in Pyspark?

I am doing cross validation on the dataset for some set of hyperparameters.

lr = LogisticRegression()
paramGrid = ParamGridBuilder() \
    .addGrid(lr.regParam, [0, 0.01, 0.05, 0.1, 0.5, 1]) \
    .addGrid(lr.elasticNetParam, [0.0, 0.1, 0.5, 0.8, 1]) \
    .build()
evaluator = BinaryClassificationEvaluator()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator)

I want to know the best value for regParam and elasticNetParam. In python we have an option to get the best parameters after cross-validation. Is there any method in pyspark to get the best values for parameters after cross-validation?

For example : regParam - 0.05 
              elasticNetParam - 0.1

Upvotes: 2

Views: 5202

Answers (2)

Terminator17
Terminator17

Reputation: 860

Let's say you've built a logistic regression model using the below arguments.

lr = LogisticRegression()
paramGrid = ParamGridBuilder() \
    .addGrid(lr.regParam, [0, 0.01, 0.05, 0.1, 0.5, 1]) \
    .addGrid(lr.elasticNetParam, [0.0, 0.1, 0.5, 0.8, 1]) \
    .build()
cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, evaluator=evaluator)
cv_model = cv.fit(train_data)

You can extract the best model parameters using the following code:

print(cv_model.getEstimatorParamMaps()[np.argmax(cv_model.avgMetrics)])

Upvotes: 2

Manu Valdés
Manu Valdés

Reputation: 2372

Well, you have to fit your CrossValidator first:

cv_model = cv.fit(train_data)

After you do that, you will have a best_model in:

best_model = cv_model.bestModel

To extract the parameters, you will have to do this ugly thing:

best_reg_param = best_model._java_obj.getRegParam()
best_elasticnet_param = best_model._java_obj.getElasticNetParam()

Upvotes: 3

Related Questions