Reputation: 2825
I am trying to run a process using the Futures API of scala to run certain actions in parallel. Below is a sample code snippet
import scala.util._
import scala.concurrent._
import scala.concurrent.ExecutionContext.Implicits.global
object ConcurrentContext {
def def appMain(args: Array[String]) = {
// configure spark
val spark = SparkSession
.builder
.appName("parjobs")
.getOrCreate()
val pool = Executors.newFixedThreadPool(5)
// create the implicit ExecutionContext based on our thread pool
implicit val xc = ExecutionContext.fromExecutorService(pool)
/** Wraps a code block in a Future and returns the future */
def executeAsync[T](f: => T): Future[T] = {
Future(f)
}
}
My questions are:-
If I set executor-cores value to 4 which controls the number of threads per executor JVM and create a thread pool of 5 inside the application, which one would take precedence?
If I don't explicitly set the thread pool then the default ExecutionContext
will create a default thread pool based on all the cores present on the machine from where the process is initiated (which would be the driver), in that situation how would the executor-core property impact?
If the thread pool value takes precedence over executor-core and if I use the default value is there a possibility that there are many threads(equal to CPU cores) per JVM?
Upvotes: 2
Views: 4374
Reputation: 74729
If I set executor-cores value to 4 which controls the number of threads per executor JVM and create a thread pool of 5 inside the application
When you execute a Spark application you have the driver and one or more executors. For the sake of simplicity, let's assume you have one executor only.
You have 4 CPUs for the executor.
How many tasks can you run in parallel with 4 CPUs? 4 exactly!
The driver runs within that part of the Spark application that has a thread pool of 5 threads. For the sake of simplicity, let's assume that all 5 are used.
How many Spark jobs can you schedule? 5 exactly!
Every Spark job can have one or more stages with one or more partitions to process using tasks. For the sake of simplicity, let's assume that all 5 Spark jobs have 1 stage with 1 partition (which is highly unlikely, but just to give you some idea how Spark works it should be fine).
Remember that 1 partition is exactly 1 task.
How many tasks will the Spark application submit? 5 jobs with 1 task each gives 5 tasks.
How much time does it take to execute all 5 tasks on 5-CPU executor? 1 time slot (whatever "time slot" could mean).
That's the relationship between executor cores/CPUs and the thread pool of 5 threads on the driver.
If I don't explicitly set the thread pool then the default ExecutionContext will create a default thread pool based on all the cores present on the machine from where the process is initiated (which would be the driver), in that situation how would the executor-core property impact?
I think the above part answers this question too.
If the thread pool value takes precedence over executor-core and if I use the default value is there a possibility that there are many threads(equal to CPU cores) per JVM?
So does it. Correct?
Upvotes: 3