Reputation: 1144
The following is a minimal example of the problem I am facing. I have an array that I want to modify in-place as it has about a million elements. the following code works except for the very last statement.
import spark.implicits._
case class Frame(x: Double, var y: Array[Double]) {
def total(): Double = {
return y.sum
}
def modifier(): Unit = {
for (i <- 0 until y.length) {
y(i) += 10
}
return
}
}
val df = Seq(
(1.0, Array(0, 2, 1)),
(8.0, Array(1, 2, 3)),
(9.0, Array(11, 21, 23))
).toDF("x", "y")
val ds = df.as[Frame]
ds.show
ds.map(_.total()).show // works
ds.map(_.modifier()).show // does not work
The error is as follows:
scala> ds.map(_.modifier()).show
<console>:50: error: Unable to find encoder for type Unit. An implicit Encoder[Unit] is needed to store Unit instances in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
ds.map(_.modifier()).show
I cannot see the origin of the problem. I would be grateful for any help in fixing the bug.
Upvotes: 1
Views: 123
Reputation: 27373
Actually, this has nothing to do with 'var' or 'val', its about mutable data structures. The problem is that modifier
returns Unit
(e.g. nothing), so you cannot map on this results. You can run it using :
case class Frame(x: Double, var y: Array[Double]) {
def total(): Double = {
return y.sum
}
def modifier(): Frame = {
for (i <- 0 until y.length) {
y(i) += 10
}
return this
}
}
But I does not make much sense in my opinion, you should avoid mutable state. In addition, I would keep case classes simple (i.e. without logic) in spark, use them as data containers only. If you must increase every element by then, you can do it also like this:
case class Frame(x: Double, val y: Array[Double])
ds.map(fr => fr.copy(y = fr.y.map(_+10.0))).show
Upvotes: 1