Reputation: 3186
I am currently working to deploy two spark applications and I want to restrict cores and executors per application. My config is as follows:
spark.executor.cores=1
spark.driver.cores=1
spark.cores.max=1
spark.executor.instances=1
Now the issue is that with this exact configuration, one streaming application works while the other doesn't. The application that doesn't work remain in state: RUNNING and continuously print the following message in logs:
17/03/06 10:31:50 INFO JobScheduler: Added jobs for time 1488814310000 ms
17/03/06 10:31:55 INFO JobScheduler: Added jobs for time 1488814315000 ms
Surprisingly, if I change the configuration to the following, the same application that was not working now proceed without problem.
spark.executor.cores=3
spark.driver.cores=1
spark.cores.max=3
spark.executor.instances=3
Note: The application does not work with value 2. This is why I use a minimum of 3.
It thus appears that some streaming applications need more cores than others. My question is what determines how much resources an application needs? Why is one application not able to run with once single core while it can run with 3 cores?
Upvotes: 0
Views: 487
Reputation: 20826
How many receivers are you using? You must make sure there are enough cores for running receivers and Spark jobs:
A DStream is associated with a single receiver. For attaining read parallelism multiple receivers i.e. multiple DStreams need to be created. A receiver is run within an executor. It occupies one core. Ensure that there are enough cores for processing after receiver slots are booked i.e. spark.cores.max should take the receiver slots into account. The receivers are allocated to executors in a round robin fashion.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#important-points-to-remember
Upvotes: 1