Mohini
Mohini

Reputation: 117

By-passing the shuffling stage of Mapreduce job in hadoop?

I am trying to implement an algorithm where only single reducer is required and mapreduce job is executing iteratively. Result of each mapper in particular iteration is to be added in reducer and then processed. Then output of the reducer is passed as input to mapper in other iteration. I want to execute the job in asynchronous manner i.e. as soon as pre-defined number of mappers are executed, pass the output directly to the reducer i.e. avoiding the shuffling and sorting as its creating only overhead for my algorithm. Is that even possible? If not, what can be done for asynchronous exceution of mapreduce job at implementation level. I went to number of research papers but unable to get any idea from there.

Thanks.

Upvotes: 2

Views: 184

Answers (1)

Armin Braun
Armin Braun

Reputation: 3683

You have to code up you own custom solution for this. I did a similar thing in a project recently.

It requires a bit of code, so I can only outline the steps here :)

  • set mapreduce.job.reduce.slowstart.completedmaps to 0.0 so that the reducer comes up before the mappers finish (this will give you a speedup right away btw. try it out before going ahead with below steps ;) maybe it's enough)
  • Implement your own org.apache.hadoop.mapred.MapOutputCollector that writes the shuffle output to Socket instead of to the standard shuffle path (this is the mapper side)
  • Implement your own org.apache.hadoop.mapred.ShuffleConsumerPlugin that waits for connections by mappers and reads pairs from the network (this is the reducer side)

Things you will need to do:

  • Synchronize the mappers not starting before that reducer is actually listening (Zookeeper is what I used here)
  • Adjust your job configs to use the custom mapper and reducer components

Futher reading: https://hadoop.apache.org/docs/r2.6.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/PluggableShuffleAndPluggableSort.html

Def. doable, but requires some effort :)

Upvotes: 3

Related Questions