Reputation: 21

running a single job across multiple workers in apache spark

I am trying to get to know how Spark splits a single job (a scala file built using sbt package and the jar is run using spark-submit command) across multiple workers.

For example : I have two workers (512MB memory each). I submit a job and it gets allocated to one worker only (if driver memory is less than the worker memory). In case the driver memory is more than the worker memory, it doesn't get allocated to any worker (even though the combined memory of both workers is higher than the driver memory) and goes to submitted state. This job then goes to running state only when a worker with the required memory is available in the cluster.

I want to know whether one job can be split up across multiple workers and can be run in parallel. If so, can anyone help me with the specific steps involved in it.

Note : the scala program requires a lot of jvm memory since I would be using a large array buffer and hence trying to split the job across multiple workers

Thanks in advance!!

Upvotes: 2

Answers (2)

Daniel Darabos

Reputation: 27455

Make sure your RDD has more than one partition (rdd.partitions.size). Make sure you have more than one executor connected to the driver (http://localhost:4040/executors/).

If both of these are fulfilled, your job should run on multiple executors in parallel. If not, please include code and logs in your question.

Upvotes: 0

Krishna

Reputation: 3

Please check if the array you would be using is parallelized. Then when you do some action on it, it should work in parallel across the nodes.

Check out this page for reference : http://spark.apache.org/docs/0.9.1/scala-programming-guide.html

Upvotes: 0

running a single job across multiple workers in apache spark

Answers (2)

Related Questions