IFH
IFH

Reputation: 161

How does Hadoop run the java reduce function on the DataNode's

I am confused on how the Datanode's in a hadoop cluster runs the java code for the reduce function of a job. Like, how does hadoop send a java code to another computer to execute?

Help me trace this code where the Master Node sends the java code for the reduce function to a datanode.

Upvotes: 0

Views: 521

Answers (2)

Durga Viswanath Gadiraju
Durga Viswanath Gadiraju

Reputation: 3956

As shown in the picture, here is what happens:

  • You run the job on client by using hadoop jar command in which you pass jar file name, class name and other parameters such as input and output
  • Client will get new application id and then it will copy the jar file and other job resources to HDFS with high replication factor (by default 10 on large clusters)
  • Then Client will actually submit the application through resource manager
  • Resource manager keeps track of cluster utilization and submit application master (which co-ordinates the job execution)
  • Application master will talk to namenode and determine where the blocks for input are located and then work with nodemanagers to submit the tasks (in the form of containers)
  • Containers are nothing but JVMs and they run map and reduce tasks (mapper and reducer classes), when the JVM is bootstrapped job resources that are on HDFS will be copied to the JVM. For mappers these JVMs will be created on same nodes on which data exists. Once the processing is started the jar file will be executed to process the data locally on that machine (typical).
  • To answer your question, reducer will be running on one or more data nodes as part of the containers. Java code will be copied as part of the bootstrap process (when JVM is created). Data will be fetched from mappers over the network.

Anatomy of map reduce job using YARN

Upvotes: 1

Dhruv Kapatel
Dhruv Kapatel

Reputation: 893

No. Reduce functions are executed on data nodes. Hadoop transfers packaged code (jar files) to the data node that are going to process data. At run time data nodes download these code and process task.

Upvotes: 0

Related Questions