Reputation: 887
What's the behavior when a partition is sent to a node and the node crashes right before executing a job? If a new node is introduced into the cluster, what's the entity that detects the addition of this new machine? Does the new machine get assigned the partition that didn't get processed?
Upvotes: 2
Views: 3727
Reputation: 153
The master considers the worker to be failure if it didnt receive the heartbeat message for past 60 sec (according to spark.worker.timeout). In that case the partition is assigned to another worker(remember partitioned RDD can be reconstructed even if its lost).
For the question if the new node is introduced into cluster? the spark-master will not detect the new node addition to the cluster once the slaves are started, because before application-submit in cluster the sbin/start-master.sh
starts the master and sbin/start-slaves.sh
reads the conf/slaves
file (contains IP address of all slaves) in spark-master machine and starts a slave instance on each machine specified. The spark-master will not read this configuration file after being started. so its not possible to add a new node once all slaves started.
Upvotes: 2