Reputation: 1668
How could I find the duration of each step in a Spark stage?
I'd like to figure out which step exactly is the bottleneck of my job.
Upvotes: 2
Views: 2894
Reputation: 592
You can refer to the class StreamingJobProgressListener which is the default implementation of StreamingListener by Spark for capturing job progress metrics.
This listener can be fetched as follows:
JavaStreamingContext jssc = new JavaStreamingContext(sparkconf, Durations.seconds(60));
StreamingJobProgressListener progressListener = jssc.ssc().progressListener();
You can explore the progressListener.onStageSubmitted, progressListener.onStageCompleted, progressListener.onTaskStart and progressListener.onTaskEnd functions for getting the metrics you required.
Upvotes: 3
Reputation: 75
I don't think you can use Spark UI to get much performance metrics about specific transformations inside a stage such as map or flatMap because it is part of optimizations of pipeline operations in Spark.
You could however insert collect() action and timer between these transformations to simulate it.
Here is a post that could shed light on how one can use Spark UI to debug applications. Understanding your Apache Spark application through visualization
Upvotes: 1