How to investigate a kryo buffer overflow happening in spark?

Question

I encountered a kryo buffer overflow exception, but I really don't understand what data could require more than the current buffer size. I already have spark.kryoserializer.buffer.max set to 256Mb, and even a toString applied on the dataset items, which should be much bigger than what kryo requires, take less than that (per item).

I know I can increase this parameter, and I will right now, but I don't think this is a good practice to simply increase resources when reaching a bound without investigating what happens (same as if I get an OOM and simply increase ram allocation without checking what takes more ram)

=> So, is there a way to investigate what is put in the buffer along the spark dag execution?

I couldn't find anything in the spark ui.

Note that How Kryo serializer allocates buffer in Spark is not the same question. It ask how it works (and actually no one answers it), and I ask how to investigate. In the above question, all answers discuss the parameters to use, I know which param to use and I do manage to avoid the exception by increasing the parameters. However, I already consume too much ram, and need to optimize it, kryo buffer included.

How to investigate a kryo buffer overflow happening in spark?

Answers (1)

Related Questions