Lorenzo Belli
Lorenzo Belli

Reputation: 1847

How to read Spark UI

I'm trying to understand the current Spark situation in this picture.

What It Looks Like to Me

(Note: I am sure about how jobs have been split among nodes.)

My Expectation

Does this mean that the Memory per node is shared among all the tasks running on a node for a specific app? I though that the property conf.set('spark.executor.memory', executor_memory) should have been per task.

Rationale:

I do know how much memory each task need but I don't know how many tasks goes into each executor: therefore I cannot estimate per executor memory.

enter image description here

Upvotes: 1

Views: 979

Answers (1)

Brian Cajes
Brian Cajes

Reputation: 3402

Does this mean that the Memory per node is shared among all the tasks running on a node for a specific app?

That is correct, memory per node refers to the total memory allocated for an application on each node. This memory is further split up according to spark memory configurations (http://spark.apache.org/docs/latest/configuration.html#memory-management). When estimating memory requirements, one needs to take into account how much memory will be used for storage (i.e. cached dataframes/rdds) and execution. By default, half of the memory is set aside for execution of tasks and half for storage. Also configurable is the number of tasks that can be run in parallel (defaults to # of cores). Given that half the memory is used for execution, and assuming you have partitioned your data appropriately, then the total amount of memory needed to run your application with default configurations is about 2*(# of tasks to run in parallel)*(memory needed to run one of the largest tasks). Of course, this estimate is highly dependent on your specific use case, configuration, and implementation. There are further memory related tips in https://spark.apache.org/docs/latest/tuning.html . Hopefully the Spark UI improves in the future to provide clearer insights into memory usage.

Upvotes: 1

Related Questions