Cocoa3338
Cocoa3338

Reputation: 105

How should I set parameters "spark.kryoserializer.buffer.mb" in pyspark

Did I set correct? everytime when I run the program, it will always show errors:

Kryo serialization failed: Buffer overflow. Available: 0, required: 5. To avoid this, increase spark.kryoserializer.buffer.max value.

from pyspark.sql import SQLContext
from pyspark import SparkContext
from pyspark import SparkConf
from graphframes import *

sc = SparkContext("local")
sqlContext = SQLContext(sc)
sqlContext.sql('SET spark.sql.broadcastTimeout=9000')
sqlContext.sql('SET spark.kryoserializer.buffer.max=512')

Upvotes: 1

Views: 7988

Answers (1)

Thiago Baldim
Thiago Baldim

Reputation: 7742

What you need to use is the setConf() if you want to add configuration to SqlContext.

But if you want to add the configuration to SparkContext you can use the simple set function like this:

conf = SparkConf().setAppName('MY_APP') \
                          .set('spark.executor.cores', 4) \
                          .set('spark.executor.memory', '16g') \
                          .set('spark.driver.memory', '16g') \
                          .set('spark.yarn.executor.memoryOverhead', 1024) \
                          .set('spark.dynamicAllocation.enabled', 'true') \
                          .set('spark.shuffle.service.enabled', 'true') \
                          .set('spark.shuffle.service.port', 7337) \
                          .set('spark.dynamicAllocation.maxExecutors', 250) \
                          .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

sc = SparkContext(conf=conf)

And then add setConf to your context:

sqlContext = SqlContext(sc).setConf("spark.sql.broadcastTimeout", 9000)

Upvotes: 1

Related Questions