midhunxavier
midhunxavier

Reputation: 66

multiple spark application submission on standalone mode

i have 4 spark application (to find wordcount from text file) which written on 4 different language (R,python,java,scala)

./wordcount.R
./wordcount.py
./wordcount.java
./wordcount.scala

spark works in standalone mode... 1.4worker nodes 2.1 core for each worker node 3.1gb memory for each node 4.core_max set to 1

./conf/spark-env.sh

export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1"

export SPARK_WORKER_OPTS="-Dspark.deploy.defaultCores=1"

export SPARK_WORKER_CORES=1

export SPARK_WORKER_MEMORY=1g

export SPARK_WORKER_INSTANCES=4

i submitted spark application using pgm.sh file on terminal

./bin/spark-submit  --master spark://-Aspire-E5-001:7077 ./wordcount.R  &

./bin/spark-submit  --master spark://-Aspire-E5-001:7077 ./wordcount.py &

./bin/spark-submit  --master spark://-Aspire-E5-001:7077 ./project_2.jar &

./bin/spark-submit  --master spark://-Aspire-E5-001:7077 ./project_2.jar 

when each process executing individually it takes 2sec. when all process executed using .sh file on terminal it takes 5 sec to 6sec

how do i run different spark applications parallelly? how to assign each spark application to individual core?

Upvotes: 0

Views: 507

Answers (1)

Alpha Bravo
Alpha Bravo

Reputation: 179

Edit: In Standalone mode, per the documentation, you simply need to set spark.cores.max to something less than the size of your standalone cluster, and it is also advisable to set spark.deploy.defaultCores for the applications that don't explicitly set this setting.

Original Answer (assuming this was running in something like local or YARN): When you submit multiple spark applications, they should run in parallel automatically, provided that the cluster or server you are running on is configured to allow for that. A YARN cluster, for example, would run the applications in parallel by default. Note that the more applications you are running in parallel, the greater the risk of resource contention.

On "how to assign each spark application to individual core": you don't, Spark handles the scheduling of workers to cores. You can configure how many resources each of your worker executors uses, but the allocation of them is up to the scheduler (be that Spark, Yarn, or some other scheduler).

Upvotes: 1

Related Questions