Spark UI on Google Dataproc: numbers interpretation

Question

I'm running a spark job on a Google Dataproc cluster (3 nodes n1-highmem-4 so 4 cores and 26GB each, same type for the master). I have a few questions about informations displayed on the Hadoop and the spark UI:

When I check the Hadoop UI I get this:

My question here is : my total RAM is supposed to be 84 (3x26) so why only 60Gb displayed here ? Is 24GB used for something else ?

2)

This is the screen showing currently launched executors. My questions are:

Why only 10 cores are used ? Shouldn't we be able to launch a 6th executor using the 2 remaining cores since we have 12, and 2 seem be used per executor ?
Why 2 cores per executor ? Does it change anything if we run 12 executor with 1 core each instead ?
What is the "Input" column ? The total volume each executor received to analyze ?

3)

This is a screenshot of the "Storage" panel. I see the dataframe I'm working on. I don't understand the "size in memory" column. Is it the total RAM used to cache the dataframe ? It seems very low compared to the size of row files I load into the Dataframe ( 500GB+ ). Is it a wrong interpretation ?

Thanks to anyone who will read this !

Spark UI on Google Dataproc: numbers interpretation

Answers (1)

Related Questions