Reputation: 127
I have installed hadoop 2.7.2 in pseudo-distributed mode(machine-1).I want to add a new datanode to it to make it as a cluster.As, but the problem is both of the machine has differnet disk partitions.
I installed same version hadoop 2.7.2 in new data node(machine-2) and also can ssh with machine-1.After googling many websites, all have common tutorials mentioning that, we have to have the same configurations files inside /etc/hadoop/
folder.
With the above said, my existing configurations in machine-1 are:
core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/home1/tmp</value>
<description>A base for other temporary directories
<property>
<name>fs.default.name</name>
<value>hdfs://CP000187:9000</value>
</property>
<property>
<name>hadoop.proxyuser.vasanth.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.vasanth.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home1/hadoop_data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home1/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
/home1
is a disk mounted in machine1.Machine-2 has two disk mounted namely /hdd1 and /hdd2.
Now, what should i specify in hdfs-site.xml
on the new machine(machine-2) to make use of both hdd1 and hdd2?
should the value of dfs.data.dir
of all nodes needs to be same?
Is the dfs.namenode.name.dir
property required on hdfs-site.xml
on machine2(since it is not a name node)?
My simplified question is it mandatory to replicate the master node configuration files in slave nodes also? Please help me out on this..
Upvotes: 4
Views: 21464
Reputation: 3374
To add datanode check below
Copy core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml and hadoop-env.sh
files to new machine hadoop dir
Add ip address or hostname in /etc/hosts
Add ip address of the new datanode in slaves file
(located in /etc/hadoop/
)
As you mentioned you have 2 hdd, mention those locations in hdfs-site.xml
file like below
<name>dfs.datanode.data.dir<name>
<property>/hdd1,/hdd2<property>
Upvotes: 1
Reputation: 151
You just need to copy entire hadoop folder from node1 to node2 . So in both configuration should point hdfs://CP000187:9000 . You dont have to do any addition settings in node2 .
To start datanode in node2 run (From sbin) .You need run only datanode and nodemanager process in node2
./hadoop-daemon.sh start datanode
To check whether datanode is added correct or not , run dfsadmin -report in node1
hadoop dfsadmin -report
Output :
Configured Capacity: 24929796096 (23.22 GB)
Present Capacity: 17852575744 (16.63 GB)
DFS Remaining: 17851076608 (16.63 GB)
DFS Used: 1499136 (1.43 MB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (2):
Upvotes: 1