Reputation: 4940
I would like to get the cross-validation
's (internal) training accuracy, using PySpark
end ML
library:
lr = LogisticRegression()
param_grid = (ParamGridBuilder()
.addGrid(lr.regParam, [0.01, 0.5])
.addGrid(lr.maxIter, [5, 10])
.addGrid(lr.elasticNetParam, [0.01, 0.1])
.build())
evaluator = MulticlassClassificationEvaluator(predictionCol='prediction')
cv = CrossValidator(estimator=lr,
estimatorParamMaps=param_grid,
evaluator=evaluator,
numFolds=5)
model_cv = cv.fit(train)
predictions_lr = model_cv.transform(validation)
predictions = evaluator.evaluate(predictions_lr)
In order to take the accuracy metric for each c.v.
folder, I have tried:
print(model_cv.subModels)
but the result of this method is empty (None
).
How could I get the accuracy
of each folder?
Upvotes: 1
Views: 727
Reputation: 194
I know this is old but just in case someone is looking, for spark to save the non-best model(s) during the cross-validation process, one needs to enable collection of submodels when creating a CrossValidator
. Just set the value to True (which is False by default).
i.e.
CrossValidator(estimator=lr,
estimatorParamMaps=param_grid,
evaluator=evaluator,
numFolds=5,
collectSubModels=True)
Upvotes: 1