Reputation: 1746
I'm trying to run spark in standalone mode in my system. The current specification of my system is 8 cores and 32 Gb memory. Base on this article I calculate the spark configurations as the following:
spark.driver.memory 2g
spark.executor.cores 3
spark.executor.instances 2
spark.executor.memory 20g
maximizeResourceAllocation TRUE
I created spark context in my jupyter notebook like this and was checking the parallelism level, by this
sc = SparkContext()
sc.defaultParallelism
The default parallelism is giving me 8. My question is why it's giving me 8 even though I mentioned 2 cores? If it's not giving me the actual parallelism of my system, then how to get the actual level of parallelism?
Thank you!
Upvotes: 9
Views: 8324
Reputation: 91
I had the same issue, my mac has 1 CPU and only 4 cores but when I would do
sc.defaultParallelism
I always got 8.
So I kept wondering why that was and finally figured out it was hyper threading enabaled on the cpu that gives you 8 logical cpu's on the mac
$ sysctl hw.physicalcpu hw.logicalcpu
hw.physicalcpu: 4
hw.logicalcpu: 8
Upvotes: 2
Reputation: 75
Thank you all, if someone faces the same needs in cluster execution with pyspark (version > 2.3.X), I had to recover the variable as below:
spark.sparkContext.getConf().getAll()
and then I used python to get only the value of the spark.default.parallelism key.
Just in case!
Thanks!
Upvotes: 0
Reputation: 29185
sc.defaultParallelism
returns default level of parallelism defined on SparkContext.By default it is number of cores available to application.
but to know what are the setting pre-applied for jupyter note book, you can print
sc._conf.getAll()
from scala sc.getConf.getAll.foreach(println)
That should have the property
spark.default.parallelism
I think in this case its preset thats why you are getting 8 in your case.
Upvotes: 8