Hard Worker
Hard Worker

Reputation: 1111

hadoop MapReduce job on single node Vs Multi node

Hey I have written my first Java code for map reduce. I have run it on a single node.

But I'm not sure what changes doest it needs to work with multi nodes If any , could someone direct me ?

Upvotes: 0

Views: 464

Answers (1)

vefthym
vefthym

Reputation: 7462

A good starting point is to follow this tutorial.

The main points that you should have a look at are:

  • /etc/hosts file of each node, where you add the ip of all the nodes (also make sure you can ssh to each node without a password)
  • $HADOOP_HOME/conf/masters and $HADOOP_HOME/conf/slaves files in the master node, where you add the corresponding nodes
  • increase the number reduce tasks, in case it is 1 and your algorithm supports that. You can do that in your main method, by calling the setNumReduceTasks(int n) method (instructions on setting this can be found here).
  • set the replication factor in case it is 1 (the default is 3), to take advantage of data locality (data is copied to more nodes, so some data transfering can be saved).
  • set the *-site.xml files, as instructed in the provided tutorial.

Of course, you should stop the cluster before changes and restart afterwards.

Upvotes: 1

Related Questions