Reputation: 11
Can any one please clarify what happens to the data already in the existing hadoop cluster when an additional data node is added to the cluster. Will there be any auto re-balancing of the existing data across the new node?
Upvotes: 1
Views: 1900
Reputation: 33495
Any new blocks in the HDFS will be placed in the new data node, because it is the least utilized in terms of storage. The existing blocks from other nodes won't be automatically moved to the new node, the start-balancer.sh and stop-balancer.sh scripts have to run for balancing the blocks across the new and the old data nodes.
Check this article for more information on the same.
Upvotes: 1