Reputation: 5556
I am trying to calculate Root Mean Square Error (RMSE) manually on Spark (Scala 2.11)
As screenshot above, I calculate Square Error (SE) for each row
val predicted_with_sqr_err = predicted.withColumn("se", pow(($"medianHouseValue" - $"prediction"), lit(2)))
Then I calculate Mean Square Error (MSE)
val sum_se = predicted_with_sqr_err.agg(sum("se")).first.get(0)
val sum_se_double = sum_se.toString.toDouble
val mean_sqr_err = (1.0/predicted_with_sqr_err.count)*sum_se_double
It worked fine. But when I trying to square root to calculate Root Mean Square Error (RMSE).
val root_mean_sqr_err = sqrt(mean_sqr_err)
It give error:
<console>:83: error: overloaded method value sqrt with alternatives:
(colName: String)org.apache.spark.sql.Column <and>
(e: org.apache.spark.sql.Column)org.apache.spark.sql.Column
cannot be applied to (Double)
val root_mean_sqr_err = sqrt(mean_sqr_err)
How should I fix ?
Upvotes: 1
Views: 2678
Reputation: 317
The problem is that you are using sqrt
function which is defined in Spark SQL
. This function should be used only as a part of Spark SQL DSL (in selections, aggregations, etc.).
It takes Column
or String
as a parameter but you are trying to pass Double
.
Instead use sqrt
function which is defined in scala.math
package:
val root_mean_sqr_err = math.sqrt(mean_sqr_err)
Upvotes: 2