iamorozov
iamorozov

Reputation: 811

How Apache Zeppelin computes Spark job progress bar?

When starting spark job from Apache Zeppelin notebook interface it shows you a progress bar of job execution. But what does this progress actually mean? Sometimes it shrinks or expands. Is it a progress of current stage or a whole job?

Upvotes: 1

Views: 650

Answers (1)

Double Sept
Double Sept

Reputation: 326

In the web interface, the progress bar is showing the value returned by the getProgress function (not implemented for every interpeters, such as python).

This function returns a percentage.

When using the Spark interpreter, the value seems to be the percentage of tasks done (Calling the following progress function from JobProgressUtil) :

def progress(sc: SparkContext, jobGroup : String):Int = {
    val jobIds = sc.statusTracker.getJobIdsForGroup(jobGroup)
    val jobs = jobIds.flatMap { id => sc.statusTracker.getJobInfo(id) }
    val stages = jobs.flatMap { job =>
      job.stageIds().flatMap(sc.statusTracker.getStageInfo)
    }

    val taskCount = stages.map(_.numTasks).sum
    val completedTaskCount = stages.map(_.numCompletedTasks).sum
    if (taskCount == 0) {
      0
    } else {
      (100 * completedTaskCount.toDouble / taskCount).toInt
    }
}

Meanwhile, I could not find it specified in the Zeppelin documentation.

Upvotes: 2

Related Questions