Reputation: 73
I am unable to user the kryo serializer in spark-2.0.2. In my scala driver code, I have.
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
parkConf.set("spark.kryoserializer.buffer.max","64m")
parkConf.set("spark.kryoserializer.buffer","64k")
However, this generates the following error:
[Stage 0:> (0 + 1) / 4]17/03/30 10:15:34 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 157. To avoid this, increase spark.kryoserializer.buffer.max value.
In addition, I tried setting the same properties in spark-defaults.conf, with the same error. Give the error is reporting that the "Available" size is 0:, it would seem that my settings are being ignored.
Upvotes: 3
Views: 3457
Reputation: 73
I now understand. "spark.kryoserializer.buffer.max" must be big enough to accept all the data in the partition, not just a record. For a partition containing 512mb of 256 byte arrays, the buffer.max must be on the order of 768mb. I didn't see this explained anywhere in the docs, and was under the impression that buffer.max had to be big enough to accept the largest serialized record in the partition.
Upvotes: 3