Sanjeev
Sanjeev

Reputation: 21

How to delete datanode from hadoop clusters without losing data

I want to delete datanode from my hadoop cluster, but don't want to lose my data. Is there any technique so that data which are there on the node which I am going to delete may get replicated to the reaming datanodes?

Upvotes: 1

Views: 1329

Answers (2)

Carlos Saltos
Carlos Saltos

Reputation: 1511

  1. Check all the current data nodes are healthy, for these you can go to the Hadoop master admin console under the Data nodes tab, the address is normally something link http://server-hadoop-master:50070

  2. Add the server you want to delete to the files /opt/hadoop/etc/hadoop/dfs.exclude using the full domain name in the Hadoop master and all the current datanodes (your config directory installation can be different, please double check this)

  3. Refresh the cluster nodes configuration running the command hdfs dfsadmin -refreshNodes from the Hadoop name node master

  4. Check the Hadoop master admin home page to check the state of the server to remove at the "Decommissioning" section, this may take from couple of minutes to several hours and even days depending of the volume of data you have.

  5. Once the server is shown as decommissioned complete, you may delete the server.

NOTE: if you have other services like Yarn running on the same server, the process is relative similar but with the file /opt/hadoop/etc/hadoop/yarn.exclude and then running yarn rmadmin -refreshNodes from the Yarn master node

Upvotes: 0

Shravanya
Shravanya

Reputation: 97

What is the replication factor of your hadoop cluster? If it is default which is generally 3, you can delete the datanode directly since the data automatically gets replicated. this process is generally controlled by name node. If you changed the replication factor of the cluster to 1, then if you delete the node, the data in it will be lost. You cannot replicate it further.

Upvotes: 3

Related Questions