Beta
Beta

Reputation: 1746

Spark Parallelism in Standalone Mode

I'm trying to run spark in standalone mode in my system. The current specification of my system is 8 cores and 32 Gb memory. Base on this article I calculate the spark configurations as the following:

spark.driver.memory 2g
spark.executor.cores 3
spark.executor.instances 2
spark.executor.memory 20g
maximizeResourceAllocation TRUE

I created spark context in my jupyter notebook like this and was checking the parallelism level, by this

sc = SparkContext()
sc.defaultParallelism

The default parallelism is giving me 8. My question is why it's giving me 8 even though I mentioned 2 cores? If it's not giving me the actual parallelism of my system, then how to get the actual level of parallelism?

Thank you!

Upvotes: 9

Views: 8324

Answers (3)

Rahul J
Rahul J

Reputation: 91

I had the same issue, my mac has 1 CPU and only 4 cores but when I would do

sc.defaultParallelism

I always got 8.

So I kept wondering why that was and finally figured out it was hyper threading enabaled on the cpu that gives you 8 logical cpu's on the mac

$ sysctl hw.physicalcpu hw.logicalcpu
hw.physicalcpu: 4
hw.logicalcpu: 8

Upvotes: 2

Estela Balboa
Estela Balboa

Reputation: 75

Thank you all, if someone faces the same needs in cluster execution with pyspark (version > 2.3.X), I had to recover the variable as below: spark.sparkContext.getConf().getAll() and then I used python to get only the value of the spark.default.parallelism key. Just in case! Thanks!

Upvotes: 0

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29185

sc.defaultParallelism

returns default level of parallelism defined on SparkContext.By default it is number of cores available to application.

but to know what are the setting pre-applied for jupyter note book, you can print

 sc._conf.getAll()

from scala sc.getConf.getAll.foreach(println)

That should have the property

spark.default.parallelism

I think in this case its preset thats why you are getting 8 in your case.

Upvotes: 8

Related Questions