hadoop MapReduce job on single node Vs Multi node

Question

Hey I have written my first Java code for map reduce. I have run it on a single node.

But I'm not sure what changes doest it needs to work with multi nodes If any , could someone direct me ?

vefthym · Accepted Answer

A good starting point is to follow this tutorial.

The main points that you should have a look at are:

/etc/hosts file of each node, where you add the ip of all the nodes (also make sure you can ssh to each node without a password)
$HADOOP_HOME/conf/masters and $HADOOP_HOME/conf/slaves files in the master node, where you add the corresponding nodes
increase the number reduce tasks, in case it is 1 and your algorithm supports that. You can do that in your main method, by calling the setNumReduceTasks(int n) method (instructions on setting this can be found here).
set the replication factor in case it is 1 (the default is 3), to take advantage of data locality (data is copied to more nodes, so some data transfering can be saved).
set the *-site.xml files, as instructed in the provided tutorial.

Of course, you should stop the cluster before changes and restart afterwards.

Answers (1)