Reputation: 111
I am trying to run Spark (Java) code and getting the error
org.apache.spark.SparkException: Kryo serialization failed: Buffer overflow. Available: 0, required: 27".
Other posts have suggested setting the buffer to its max value. When I tried this with max buffer value of 512MB I got the error
java.lang.ClassNotFoundException: org.apache.spark.serializer.KryoSerializer.buffer.max', '512'
How can I solve this problem?
Upvotes: 11
Views: 26421
Reputation: 565
This is an old question but the first hit when I googled, so answering here to help others.
For Spark 3.2 (in Azure Synapse environment, but not sure if that matters) I tried all of these combinations but the only one that worked to convert a large spark DataFrame toPandas() was spark.kryoserializer.buffer.max=512
. No letters after the number, no ".mb" at the end.
Upvotes: 3
Reputation: 41
either you can set this in spark configuration while creating spark session as
SparkSession.builder
.config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.config("spark.kryoserializer.buffer.max", "512m")
or you can pass with your spark submit command as
spark-submit \
--verbose \
--name "JOB_NAME" \
--master MASTER_IP \
--conf "spark.kryoserializer.buffer.max=512m" \
main.py
Upvotes: 0
Reputation: 41
Try using "spark.kryoserializer.buffer.max.mb
", "512
" instead spark.kryoserializer.buffer.max
", "512MB
"
Upvotes: 4
Reputation: 1864
The property name is correct, spark.kryoserializer.buffer.max
, the value should include the unit, so in your case is 512m.
Also, dependending where you are setting up the configuration you might have to write --conf spark.kryoserializer.buffer.max=512m
. For instance, with a spark-submit
or within the <spark-opts>...</spark-opts>
of an Oozie worflow action.
Upvotes: 4