Reputation: 1847
I'm trying to understand the current Spark situation in this picture.
What It Looks Like to Me
pyskark-shell
uses 10 cores in each machine and 32 Gb RAM in each machinebacktestin2
uses 2 or 6 cores in each machine and 8 Gb in each machine(Note: I am sure about how jobs have been split among nodes.)
My Expectation
pyskark-shell
uses 10 cores in each machine and 32 Gb RAM in each machine FOR EACH CORE = 320 Gb used totalbacktestin2
uses 16 cores split among the machine and each core require 8 Gb in each machine = total 128 GbDoes this mean that the Memory per node is shared among all the tasks running on a node for a specific app? I though that the property conf.set('spark.executor.memory', executor_memory)
should have been per task.
Rationale:
I do know how much memory each task need but I don't know how many tasks goes into each executor: therefore I cannot estimate per executor memory.
Upvotes: 1
Views: 979
Reputation: 3402
Does this mean that the Memory per node is shared among all the tasks running on a node for a specific app?
That is correct, memory per node refers to the total memory allocated for an application on each node. This memory is further split up according to spark memory configurations (http://spark.apache.org/docs/latest/configuration.html#memory-management). When estimating memory requirements, one needs to take into account how much memory will be used for storage (i.e. cached dataframes/rdds) and execution. By default, half of the memory is set aside for execution of tasks and half for storage. Also configurable is the number of tasks that can be run in parallel (defaults to # of cores). Given that half the memory is used for execution, and assuming you have partitioned your data appropriately, then the total amount of memory needed to run your application with default configurations is about 2*(# of tasks to run in parallel)*(memory needed to run one of the largest tasks). Of course, this estimate is highly dependent on your specific use case, configuration, and implementation. There are further memory related tips in https://spark.apache.org/docs/latest/tuning.html . Hopefully the Spark UI improves in the future to provide clearer insights into memory usage.
Upvotes: 1