Nizan Grauer
Nizan Grauer

Reputation: 71

Spark cores and memory management

We're running spark 1.4.0 on ec2, with 6 machines, 4 cores each. We're trying to run an application on a number of total-executor-cores. but we want it to run on the minimal number of machines as possible (e.g. total-executor-cores=4, we'll want single machine. total-executor-cores=12, we'll want 3 machines)

I'm running spark shell, in the following command:

/root/spark/bin/spark-shell --total-executor-cores X --executor-cores 4

or

/root/spark/bin/spark-shell --total-executor-cores X

and checked the cores on the spark UI, and found the following:

Req total-executor-cores    Actual cores with executor-cores param  Actual cores without executor-cores=4 param
24  24  24
22  22  16
20  20  8
16  16  0
12  12  0
8   8   0
4   4   0

our questions:

  1. Why we don't always get the number of cores we asked for when passing the "executor-cores 4" parameter? It seems that the number of cores we actually get is something like "max(24-(24-REQ_TOTAL_CORES)*4, 0)"
  2. How can we get our original request? get the cores in minimal number of machines? When playing with the executor-cores, we have the problem described in (1), but the cores are on minimal number of cores
  3. Playing with the parameter spark.deploy.spreadOut didn't seem to help with our request

Thanks,

nizan

Upvotes: 0

Views: 746

Answers (1)

Joseratts
Joseratts

Reputation: 97

If I am not wrong, by default spark is configured as 1 executor by cluster node, so I'm not an expert but I think what you want to is something like this for 1 machine and 4 cores (assuming you have default configuration)

/root/spark/bin/spark-shell --executor-cores 4 --num-executors 1

And this for 3 machine and 12 cores

/root/spark/bin/spark-shell --total-executor-cores 12 --num-executors 3

As I said before I'm not an expert but I think --executor-cores 4 means every executor (by default every cluster node) will use 4 cores while --total-executor-cores 12 means that will be used 12 cores across all executors. Also I think by default spark would try to use all possible cores according the number of executors available so maybe the first sentence to launch spark would be enough with

/root/spark/bin/spark-shell --num-executors 1

Because every executor has 4 cores (again, assuming you have default configuration).

Let me know if it works for you

Upvotes: -1

Related Questions