Reputation: 367
I've noticed that most of Scalaz classes are not serializable. In this case, I'm trying to use a type class for custom-sorting an array in Spark.
A reduce example might be something like this:
> val ord = Order[T]{ ... }
> sc.makeRDD[T](...).grupBy(...).map {
case (_, grouped) => IList[T](grouped.toList).sorted(ord).distinct(ord)
}
As you should expect, this implementation throws a NotSerializableException
because Order[T]
is not serializable.
Is there any way to make Order[T]
serializable? In a perfect world, I would wish to avoid this problem still using scalaz. In a not-so-perfect one, I'm open to consider other implementations.
Should that happen, it's mandatory to keep the custom sorting and distinct implementations in a mantainable and extensible way.
Upvotes: 2
Views: 141
Reputation: 5572
If you need access to some non-serializable object you can wrap it in an object
:
scala> class NotSerializablePrinter { def print(msg:String) = println(msg) }
defined class NotSerializablePrinter
scala> val printer = new NotSerializablePrinter
printer: NotSerializablePrinter = $iwC$$iwC$NotSerializablePrinter@3b8afdbf
scala> val rdd = sc.parallelize(Array("1","2","3"))
rdd: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[24] at parallelize at <console>:30
scala> rdd.foreach(msg => printer.print(msg)) // Fails
org.apache.spark.SparkException: Task not serializable
...
scala> object wrap { val printer = new NotSerializablePrinter }
defined module wrap
scala> rdd.foreach(msg => wrap.printer.print(msg))
1
3
2
In your case you would replace my NotSerializablePrinter
instance with your Scalaz Order
instance. This example is pulled from this useful article(item 3a).
Upvotes: 6