Reputation: 45692
Suppose you have Spark cluster with Standalone manager, where jobs are scheduled through SparkSession
created at client app. Client app runs on JVM. And you have to launch each job with different configs for the sake of performance, see Job types example below.
The problem is you can't create two sessions from single JVM.
So how you gonna launch multiple Spark jobs with different session configs simultaneously?
By different session configs I mean:
spark.executor.cores
spark.executor.memory
spark.kryoserializer.buffer.max
spark.scheduler.pool
Possible ways to solve the problem:
SparkSession
. Is it possible?SparkSession
, something that I could call Spark session service. But you never knew how many jobs with different configs you gonna launch in future simultaneously. At the moment - I need only 2-3 different configs at a time. It's may be enough but not flexible. Upvotes: 1
Views: 753
Reputation: 2281
Spark standalone uses a simple FIFO scheduler for applications. By default, each application uses all the available nodes in the cluster. The number of nodes can be limited per application, per user, or globally. Other resources, such as memory, cpus, etc. can be controlled via the application’s SparkConf object.
Apache Mesos has a master and slave processes. The master makes offers of resources to the application (called a framework in Apache Mesos) which either accepts the offer or not. Thus, claiming available resources and running jobs is determined by the application itself. Apache Mesos allows fine-grained control of the resources in a system such as cpus, memory, disks, and ports. Apache Mesos also offers course-grained control control of resources where Spark allocates a fixed number of CPUs to each executor in advance which are not released until the application exits. Note that in the same cluster, some applications can be set to use fine-grained control while others are set to use course-grained control.
Apache Hadoop YARN has a ResourceManager with two parts, a Scheduler, and an ApplicationsManager. The Scheduler is a pluggable component. Two implementations are provided, a CapacityScheduler, useful in a cluster shared by more than one organization, and the FairScheduler, which ensures all applications, on average, get an equal number of resources. Both schedulers assign applications to a queues and each queue gets resources that are shared equally between them. Within a queue, resources are shared between the applications. The ApplicationsManager is responsible for accepting job submissions and starting the application specific ApplicationsMaster. In this case, the ApplicationsMaster is the Spark application. In the Spark application, resources are specified in the application’s SparkConf object.
For your case just with standalone it is not possible may be there can be some premise solutions but I haven't faced
Upvotes: 1