bruce
bruce

Reputation: 189

org.apache.spark.sql.AnalysisException: Can't extract value from probability

I am using Naive Bayes algorithms to classify articles, and want to access the “probability” column of part results:

val Array(trainingDF, testDF) = rawDataDF.randomSplit(Array(0.6, 0.4))
    val ppline = MyUtil.createTrainPpline(rawDataDF)
    val model = ppline.fit(trainingDF)
    val testRes = model.transform(testDF)
    testRes.filter($"probability"(0).as[Double] === 1).show()

int the last line ,breaking

    Exception in thread "main" org.apache.spark.sql.AnalysisException: Can't extract value from probability#133;
            at org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
            at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:616)
            at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$9$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:608)
            at 

Upvotes: 4

Views: 4135

Answers (2)

btbbass
btbbass

Reputation: 145

Note that there are several issue opened to track this:

https://issues.apache.org/jira/browse/SPARK-19653

https://issues.apache.org/jira/browse/SPARK-12806

For the moment, Vector is not a "first class citizen" in Spark SQL API

Upvotes: 1

jamborta
jamborta

Reputation: 5210

You can always get the underlying RDD and filter that:

val filteredRes = results.rdd.filter(row => row.getAs[Vector]("probability")(0) == 1)

Then you can convert it back to a dataframe if you need to:

val df = spark.createDataFrame(filteredRes, results.schema)

Upvotes: 2

Related Questions