Reputation: 5771
I used to think that spark application finishes when all jobs succeed. But, then I came across this parameter:
spark.driver.maxResultSize: Limit of total size of serialized results of all partitions for each Spark action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.
What happens to the rest of the application when a job is aborted?
As per the answer here describing the parameter spark.driver.maxResultSize
,
The goal here is to protect your application from driver loss, nothing more.
How does aborting the job prevent driver loss?
Or, more broadly "What happens to the rest of the application when a job is aborted?"
Upvotes: 1
Views: 584
Reputation: 18098
Looking at the title of the question, if there are no issues the Spark App terminates when the last Stage completes all Tasks.
If a serious run-time error occurs that cannot be catered for by try/catch- e.g. Driver OOM blow out or a Task / Node that fails 4x, then the Spark App will terminate and running Tasks are cancelled by the Driver issuing cancel requests. The Cluster or tables, files are in an inconsistent state. It's a while since I coded as I work as Big Data / AZURE Architect these days.
Upvotes: 0
Reputation: 10703
If spark.driver.maxResultSize = 0
(ie. unlimited) and you try to load huge amount of data into the driver:
val result = spark.table("huge_table").collect()
it will hit OOM error and get killed by the scheduler, crashing entire application.
If maxResultSize
is set to some sane value however, when the amount of data being downloaded to the driver exceeds this threshold, only the job is aborted instead. Driver survives and receives a SparkException
and you have a chance to catch it and recover:
val result = try {
spark.table("huge_table").collect()
} catch {
case e: SparkException =>
if (e.getMessage().contains("maxResultSize"))
// Oops, that was too much
spark.table("huge_table").take(1000)
else
throw e
}
}
Upvotes: 2