Understanding Spark monitoring UI

Question

For a running Spark job here is part of the UI details for URL : http://localhost:4040/stages/stage/?id=1&attempt=0

enter image description here

The doc at http://spark.apache.org/docs/1.2.0/monitoring.html does not detail each of these parameters. What do the columns "Input" , "Write Time" & "Shuffle Write" indicate ?

As can see from this screenshot these 4 tasks have been running for 1.3 mins and I'm attempting to discover if there is a bottleneck then where it is occurring.

Spark is configured to use 4 cores, I think this is why there are 4 tasks displayed in UI, each task is running on a single core ?

What is determining the "Shuffle Write" sizes ?

On my console output there many log messages :

15/02/11 20:55:33 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:103306+103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:0+103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:0+103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:103306+103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:103306+103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:0+103306 15/02/11 20:55:33 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:0+103306 15/02/11 20:55:34 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:103306+103306 15/02/11 20:55:34 INFO rdd.HadoopRDD: Input split: file:/c:/data/example.txt:103306+103306 .....................

Are these the result of the files being split into multiple smaller sizes and each "Input" of size 100.9KB (specified in Spark UI screenshot) is mapping to one of these snippets ?

Sietse · Accepted Answer

Not everything is being printed in the logs, especially not any custom code (unless you print it yourself). When something is running for too long, you may want to do a thread dump on one of the executors and look at the stacks to see the progress in your computation.

Understanding Spark monitoring UI

Answers (2)

Related Questions