Ian Macalinao
Ian Macalinao

Reputation: 1668

Finding the execution time of each step of a Spark stage

How could I find the duration of each step in a Spark stage?

enter image description here

I'd like to figure out which step exactly is the bottleneck of my job.

Upvotes: 2

Views: 2894

Answers (2)

Rakesh Rakshit
Rakesh Rakshit

Reputation: 592

You can refer to the class StreamingJobProgressListener which is the default implementation of StreamingListener by Spark for capturing job progress metrics.

This listener can be fetched as follows:

JavaStreamingContext jssc = new JavaStreamingContext(sparkconf, Durations.seconds(60));
StreamingJobProgressListener progressListener =  jssc.ssc().progressListener();

You can explore the progressListener.onStageSubmitted, progressListener.onStageCompleted, progressListener.onTaskStart and progressListener.onTaskEnd functions for getting the metrics you required.

Upvotes: 3

pltc325
pltc325

Reputation: 75

I don't think you can use Spark UI to get much performance metrics about specific transformations inside a stage such as map or flatMap because it is part of optimizations of pipeline operations in Spark.

You could however insert collect() action and timer between these transformations to simulate it.

Here is a post that could shed light on how one can use Spark UI to debug applications. Understanding your Apache Spark application through visualization

Upvotes: 1

Related Questions