PySpark - How to get precision / recall / ROC from TrainValidationSplit?

Question

My current approach to evaluate different parameters for LinearSVC and get the best one:

tokenizer = Tokenizer(inputCol="Text", outputCol="words")
wordsData = tokenizer.transform(df)

hashingTF = HashingTF(inputCol="words", outputCol="rawFeatures")
featurizedData = hashingTF.transform(wordsData)

idf = IDF(inputCol="rawFeatures", outputCol="features")
idfModel = idf.fit(featurizedData)

LSVC = LinearSVC()

rescaledData = idfModel.transform(featurizedData)

paramGrid = ParamGridBuilder()\
                            .addGrid(LSVC.maxIter, [1])\
                            .addGrid(LSVC.regParam, [0.001, 10.0])\
                            .build()

crossval = TrainValidationSplit(estimator=LSVC,
                                estimatorParamMaps=paramGrid,
                                evaluator=MulticlassClassificationEvaluator(metricName="weightedPrecision"),
                                testRatio=0.01)

cvModel = crossval.fit(rescaledData.select("KA", "features").selectExpr("KA as label", "features as features"))

bestModel = cvModel.bestModel

Now I would like to get the basic parameters of ML (like precision, recall etc.), how do I get those?

PySpark - How to get precision / recall / ROC from TrainValidationSplit?

Answers (1)

Related Questions