Reputation: 133
I'm testing "shutting down servers using UPS" while hadoop task is running, and I have two questions.
I wonder if running task can be saved, and then it continues the remaining work again after rebooting. (at all nodes)
If "1" is not supported, is it safe to start shutting down process while hadoop tasks are running? Or, is there anything I have to do to preserve hadoop system? (cluster?)
Upvotes: 2
Views: 122
Reputation: 8572
It is not possible to save the state of running tasks with Hadoop as of now. It would be an extremely difficult process since all of the resource allocations happen based on the current load of the system but after restarting your entire cluster there might be entirely different workload therefore restoring the state does not make sense.
Answering your second questions, Hadoop was designed to tolerate node failures or temporary problems with accessing files and network outages as well. Individual tasks might fail and then the system restarts them on a other node. It is safe to shut down nodes from the cluster point of view, the only thing to keep in mind that the job will ultimately fail and you need to re-submit it after bringing back the cluster to life. One problem might arise with shutting down the cluster using the power switch is that temporary files are not getting cleaned up. This is usually not a major problem.
Upvotes: 1
Reputation: 12522
No, you can't "save" the task in an intermediate state. If you shut down hadoop while some jobs are running, you could end up with intermediate data from abandoned jobs occupying space. Apart from that, you could shut down the system while jobs are running.
Upvotes: 2