Reputation: 895
I have a linux cluster with 9 nodes and I have installed hadoop 1.0.2. I have a GIS program that I am running using multiple slaves. I need to measure the speedUp of my program by using say 1, 2, 3, 4 .. 8 slave nodes. I use start-all.sh/stop-all.sh script to start/stop my cluster once I make changes in the conf/slaves file by varying the number of slaves. But I am getting wierd errors while doing so, and it feels that I am not using the correct technique to add/remove slave nodes in the cluster.
Any help regarding the ideal "technique to make changes in slaves file and to restart the cluster" will be appreciated.
Upvotes: 1
Views: 1128
Reputation: 39893
The problem likely is that you are not allowing Hadoop to gracefully remove the nodes from the system.
What you want to be doing is decommissioning the nodes so that HDFS has times to re-replicate the files elsewhere. The process is essentially to add some nodes to an excludes
file. Then, you run bin/hadoop dfsadmin -refreshNodes
, which reads the configurations and refreshes the cluster's view of the nodes.
When adding nodes and even perhaps when removing nodes, you should think about running the rebalancer. This will spread the data out evenly and would help in some performance you may see if new nodes don't have any data.
Upvotes: 1