Remove a node of Hadoop which is NameNode too

Question

I recently created a cluster with five servers : master node01 node02 node03 node04

To have more "workers" I added the Nademode to the list of slaves in /etc/hadoop/slaves.

This works, the master perfoms some mapReduce jobs.

Today I want to remove this node from the workers list (this is too much CPU intensive for it). I want to set dfs.exclude in my hdfs-site.xml but I worried about the fact this is also the master server.

COuld someone confirm me that there is no risks to perform this operation ?

Thanks, Romain.

Lauri Peltonen · Accepted Answer

If there is data stored in the master node (as there probably is because it's a DataNode), you will essentially lose that data. But if your replication factor is more than 1 (3 is the default), then it doesn't matter as Hadoop will notice that some data is missing (under-replicated) and will start replicating it again on other DataNodes to reach the replication factor.

So, if your replication factor is more than 1 (and the cluster is otherwise healthy), you can just remove the master's data (and make it again just a NameNode) and Hadoop will take care of the rest.

Remove a node of Hadoop which is NameNode too

Answers (1)

Related Questions