Reputation: 8010
I am running a Spark job with following cluster and application configuration:
Total Node: 3
Master Node Memory 7.5GB, 2 Cores
Worker Node1, Memory 15GB, 4 Cores
Worker Node2, Memory 15GB, 4 Cores
Application Configuration:
--master yarn --num-executors 2 --executor-cores 2 --executor-memory 2G
I am trying to submit multiple jobs in same time with same user, however I see only first two submitted jobs are executing and third has to wait with following warring.
19/11/19 08:30:49 WARN org.apache.spark.util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/19 08:30:49 WARN org.apache.spark.util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
I found that SparkUI is being created for every submitted job, and my cluster is accepting only two job a time. Further I observed that it picked up the third job on port 4042 once first submitted job finished the execution. What could be wrong with my cluster that it is accepting only two job at a time?
Here is my Spark Session Code:
val spark: SparkSession = {
val sparkSession = SparkSession
.builder()
.appName("Data Analytics")
.config("spark.scheduler.mode", "FAIR")
//.master("local[*]")
.getOrCreate()
sparkSession
}
Further my questions are: Why SparkSession is creating SparkUI for each job and how we can solve this problem. Is there any way to use same Session for multiple jobs.
Upvotes: 0
Views: 1926
Reputation: 4133
There are several things that you have to take into account: Once you execute spark-submit a Spark application is created(client-mode) and a new driver is created and a new port is used for the driver console using the port 4040. That´s the reason of the warning, because you are trying to create another application and another driver, but the port 4040 is already used so it tries to use the 4041. A Spark job is not a Spark Application, is an execution that corresponds to a Spark action, so depending of the number of actions that your program executes the number of jobs that will be spawn.
In your case you are trying to create two executors with two cores, in other words, you are trying to create two JVMs with two cores each, apart of the driver. Because you are using Yarn it will try to provide the 4 cores for each of your applications and one for each driver.
For more info check this link: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-scheduler-ActiveJob.html
Upvotes: 1