figs_and_nuts
figs_and_nuts

Reputation: 5771

Do all jobs need to finish for spark application to finish?

I used to think that spark application finishes when all jobs succeed. But, then I came across this parameter:

spark.driver.maxResultSize: Limit of total size of serialized results of all partitions for each Spark action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

What happens to the rest of the application when a job is aborted?

As per the answer here describing the parameter spark.driver.maxResultSize,

The goal here is to protect your application from driver loss, nothing more.

How does aborting the job prevent driver loss?

Or, more broadly "What happens to the rest of the application when a job is aborted?"

Upvotes: 1

Views: 584

Answers (2)

Ged
Ged

Reputation: 18098

Looking at the title of the question, if there are no issues the Spark App terminates when the last Stage completes all Tasks.

If a serious run-time error occurs that cannot be catered for by try/catch- e.g. Driver OOM blow out or a Task / Node that fails 4x, then the Spark App will terminate and running Tasks are cancelled by the Driver issuing cancel requests. The Cluster or tables, files are in an inconsistent state. It's a while since I coded as I work as Big Data / AZURE Architect these days.

Upvotes: 0

Kombajn zbożowy
Kombajn zbożowy

Reputation: 10703

If spark.driver.maxResultSize = 0 (ie. unlimited) and you try to load huge amount of data into the driver:

val result = spark.table("huge_table").collect()

it will hit OOM error and get killed by the scheduler, crashing entire application.

If maxResultSize is set to some sane value however, when the amount of data being downloaded to the driver exceeds this threshold, only the job is aborted instead. Driver survives and receives a SparkException and you have a chance to catch it and recover:

val result = try {
  spark.table("huge_table").collect()
} catch {
  case e: SparkException =>
    if (e.getMessage().contains("maxResultSize"))
      // Oops, that was too much
      spark.table("huge_table").take(1000)
    else
      throw e
  }
}

Upvotes: 2

Related Questions