Do all jobs need to finish for spark application to finish?

Question

I used to think that spark application finishes when all jobs succeed. But, then I came across this parameter:

spark.driver.maxResultSize: Limit of total size of serialized results of all partitions for each Spark action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

What happens to the rest of the application when a job is aborted?

As per the answer here describing the parameter spark.driver.maxResultSize,

The goal here is to protect your application from driver loss, nothing more.

How does aborting the job prevent driver loss?

Or, more broadly "What happens to the rest of the application when a job is aborted?"

Kombajn zbożowy · Accepted Answer

If spark.driver.maxResultSize = 0 (ie. unlimited) and you try to load huge amount of data into the driver:

val result = spark.table("huge_table").collect()

it will hit OOM error and get killed by the scheduler, crashing entire application.

If maxResultSize is set to some sane value however, when the amount of data being downloaded to the driver exceeds this threshold, only the job is aborted instead. Driver survives and receives a SparkException and you have a chance to catch it and recover:

val result = try {
  spark.table("huge_table").collect()
} catch {
  case e: SparkException =>
    if (e.getMessage().contains("maxResultSize"))
      // Oops, that was too much
      spark.table("huge_table").take(1000)
    else
      throw e
  }
}

Do all jobs need to finish for spark application to finish?

Answers (2)

Related Questions