Adding a new hard drive or disk partition to only one datanode in HDFS

Question

I have a cluster composed by a master node (which runs only the namenode) and two slaves, namely slave1 and slave2 (which run the datanodes). Now, I want to add a new hard drive only to slave1, and use it to increase the datanode capacity. I followed different tutorials and howto on internet, and I understood how to to it in general. My probem is that adding that partition/hard drive only to slave1 raise problems, as the path to the new partitions/hard drive added in the hdfs-site.xml won't be found by the slave2.

This is what I do on slave1 (the new disk is on sdb):

I run fdisk /dev/sdb to create the partition. The process ends without problem, creating /dev/sdb1.
I format sdb1 with mkfs.ext4 /dev/sdb1.
I mount sdb1 on /disk1 with mount /dev/sdb1 /disk1
I create the datanode directory my/user/hdfs/datanode inside /disk1
I recursively change the owner of /disk1/my/user/ to give permission to my user
I stop the datanode with hadoop-daemon.sh stop datanode
I add /disk1/my/user/hdfs/datanode to the hdfs-site.xml, under the dfs.datanode.data.dir field, using comma to separete with the other path already present there. I do this on every machine.

Now, if I stop and start again the HDFS from the master, what happen is that the datanode on slave2 won't start because it cannot find the path /disk1/my/user/hdfs/datanode. My guess is then: is it possible to add a new partition/hard drive only to one datanode in the cluster? What have I to do? Create the same folder on each machine mandatorily?

iamauser · Accepted Answer

If you have the two slaves running on two separate hardwares, then you can create separate hdfs-site.xml for each of these slaves. On slave1 it will have the additional disk listed in datanode.data.dir while slave2 will not have it.

Adding a new hard drive or disk partition to only one datanode in HDFS

Answers (1)

Related Questions