Reputation: 1101
We are running our calculations in a standalone Spark cluster, ver 1.0.2 - the previous major release. We do not have any HA or recovery logic configured. A piece of functionality on the driver side consumes incoming JMS messages and submits respective jobs to spark.
When we bring the single & only Spark master down (for tests), it seems the driver program is unable to properly figure out that the cluster is no longer usable. This results in 2 major problems:
There are a couple of akka failure detection-related properties that you can configure on Spark, but:
So, can anyone please explain what's the designed behavior if a single spark master in a standalone deployment mode fails/stops/shuts down. I wasn't able to find any proper doc on the internet about this.
Upvotes: 4
Views: 971
Reputation: 723
In default, Spark can handle Workers failures but not for the Master (Driver) failure. If the Master crashes, no new applications can be created. Therefore, they provide 2 high availability schemes here: https://spark.apache.org/docs/1.4.0/spark-standalone.html#high-availability
Hope this helps,
Le Quoc Do
Upvotes: 1