Juh_
Juh_

Reputation: 15549

How to investigate a kryo buffer overflow happening in spark?

I encountered a kryo buffer overflow exception, but I really don't understand what data could require more than the current buffer size. I already have spark.kryoserializer.buffer.max set to 256Mb, and even a toString applied on the dataset items, which should be much bigger than what kryo requires, take less than that (per item).

I know I can increase this parameter, and I will right now, but I don't think this is a good practice to simply increase resources when reaching a bound without investigating what happens (same as if I get an OOM and simply increase ram allocation without checking what takes more ram)

=> So, is there a way to investigate what is put in the buffer along the spark dag execution?

I couldn't find anything in the spark ui.

Note that How Kryo serializer allocates buffer in Spark is not the same question. It ask how it works (and actually no one answers it), and I ask how to investigate. In the above question, all answers discuss the parameters to use, I know which param to use and I do manage to avoid the exception by increasing the parameters. However, I already consume too much ram, and need to optimize it, kryo buffer included.

Upvotes: 1

Views: 1787

Answers (1)

user2846168
user2846168

Reputation: 41

All data that is sent over the network or written to the disk or persisted in the memory should be serialized along with the spark dag. Hence, Kryo serialization buffer must be larger than any object you attempt to serialize and must be less than 2048m.

https://spark.apache.org/docs/latest/tuning.html#data-serialization

Upvotes: 0

Related Questions