ishwar
ishwar

Reputation: 298

How to set KryoSerializer in Pyspark?

i'm new to Pyspark Please help me with it:

spark = SparkSession.builder.appName("FlightDelayRDD").master("local[*]").getOrCreate()
sc = spark.sparkContext
sc.setSystemProperty("spark.dynamicAllocation.enabled", "true")
sc.setSystemProperty("spark.dynamicAllocation.initialExecutors", "6")
sc.setSystemProperty("spark.dynamicAllocation.minExecutors", "6")
sc.setSystemProperty("spark.dynamicAllocation.schedulerBacklogTimeout", "0.5s")
sc.setSystemProperty("spark.speculation", "true")

I want to set KryoSerializer in pyspark like i configured above.

Upvotes: 1

Views: 4475

Answers (1)

notNull
notNull

Reputation: 31540

From official docs:

Since Spark 2.0.0, we internally use Kryo serializer when shuffling RDDs with simple types, arrays of simple types, or string type.

To set Kryo serializer:

sc.setSystemProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

To check:

spark.sparkContext.getConf().get("spark.serializer")

#u'org.apache.spark.serializer.KryoSerializer'

Upvotes: 2

Related Questions